1

An extended version of the KoKo German L1 Learner corpus

This paper describes an extended version of the KoKo corpus (version KoKo4, Dec 2015), a corpus of written German L1 learner texts from …

Andrea Abel, Aivars Glaznieks, Lionel Nicolas, Egon Stemle

bot.zen @ EVALITA 2016 - A minimally-deep learning PoS-tagger (trained for Italian Tweets)

This article describes the system that participated in the POS tagging for Italian Social Media Texts (PoSTWITA) task of the 5th …

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts

The DiDi corpus of South Tyrolean data of computer-mediated communication (CMC) is a multilingual sociolinguistic language corpus. It …

Jennifer-Carmen Frey, Aivars Glaznieks, Egon W. Stemle

Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects

Michael Beißwenger, Thierry Chanier, Isabella Chiari, Tomaž Erjavec, Darja Fišer, Axel Herold, Nikola Lubešić, Harald Lüngen, Céline Poudat, Egon Stemle, Angelika Storrer, Ciara Wigham

bot.zen @ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)

This article describes the system that participated in the Part-of-speech tagging subtask of the “EmpiriST 2015 shared task on …

The DiDi Corpus of South Tyrolean CMC Data

This paper presents the DiDi Corpus, a corpus of South Tyrolean Data of Computer-mediated Communication (CMC). The corpus comprises …

Jennifer-Carmen Frey, Aivars Glaznieks, Egon W. Stemle

Correcting OCR errors for German in Fraktur font

In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from …

Michel Généreux, Egon W. Stemle, Lionel Nicolas, Verena Lyding

Collecting language data of non-public social media profiles

In this paper, we propose an integrated web strategy for mixed sociolinguistic research methodologies in the context of social media …

Jennifer-Carmen Frey, Egon W. Stemle, Aivars Glaznieks

'interHist' - an interactive visual interface for corpus exploration

In this article, we present interHist, a compact visualization for the interactive exploration of results to complex corpus queries. …

Verena Lyding, Lionel Nicolas, Egon Stemle

KoKo: An L1 Learner Corpus for German

We introduce the KoKo corpus, a collection of German L1 learner texts annotated with learner errors, along with the methods and tools …

Andrea Abel, Aivars Glaznieks, Lionel Nicolas, Egon Stemle