Working together towards an ideal infrastructure for language learner corpora

Egon W. Stemle, Adriane Boyd, Maarten Janssen, Therese Lindström Tiedemann, Nives Mikelić Preradović, Alexandr Rosen, Dan Rosén, Elena Volodina

October 2019

PDF Publisher

Abstract

In this article we give an overview of first-hand experiences and starting points for best practices from projects in seven European countries dedicated to learner corpus research and the creation of language learner corpora. The corpora and tools involved in LCR are becoming more and more important, and the careful preparation and easy retrieval, and reusability of corpora and tools has likewise become more important. But with a lack of agreed solutions for many aspects of LCR, interoperability between learner corpora or exchanging data from different learner corpus projects is still challenging. We will illustrate how concepts like metadata, anonymization, error taxonomies and linguistic annotations, as well as tools, toolchains or data formats can individually pose challenges and how they might be solved.

Type

Journal article

Publication

Widening the Scope of Learner Corpus Research. Selected Papers from the Fourth Learner Corpus Research Conference 2017