Working together towards an ideal infrastructure for language learner corpora

In this article we give an overview of first-hand experiences and starting points for best practices from projects in seven European countries dedicated to learner corpus research and the creation of language learner corpora. The corpora and tools …

Was darf Forschung mit Social Media Daten?

Interview in Academia (science magazine by EURAC and unibz), Bolzano, Italy

The DiDi Corpus of South Tyrolean CMC Data

This paper presents the DiDi Corpus, a corpus of South Tyrolean Data of Computer-mediated Communication (CMC). The corpus comprises around 650,000 tokens from Facebook wall posts, comments on wall posts and private messages, as well as …

Automated L1 identification in English learner essays and its implications for language transfer

This article focuses on automatic text classification which aims at identifying the first language (L1) background of learners of English. A particular question arising in the context of automated L1 identification is whether any features that are …

Challenges of building a CMC corpus for analyzing writer's style by age: The DiDi project

Special Issue: Building and annotating corpora of computer-mediated discourse. Issues and Challenges at the Inteface of Corpus and Computational Linguistics

Establishing a Standardised Procedure for Building Learner Corpora

Decisions at the outset of preparing a learner corpus are of crucial importance for how the corpus can be built and how it can be analysed later on. This paper presents a generic workflow to build learner corpora while taking into account the needs …

Language as a Detective Story

Article in Academia (science magazine by EURAC and unibz), Bolzano, Italy

Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation