Web Corpus Creation, Cleaning and Evaluation


The web has become increasingly popular as a source of linguistic data, not only within the NLP community, but also with lexicographers and linguists. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. However, a number of issues in web corpus construction still need much research, ranging from questions of corpus design to more-technical aspects of efficient construction of large corpora. Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus. This year we are excited to meet at mbox Electronic lexicography in the 21st century: linking lexical data in the digital age (eLex 2015). Our meeting provides a forum for those with common interest in innovative developments in the field of lexicography and (very large) web-based corpora. We will set the stage with a talk ranging from web corpus construction to the evaluation of web corpora.

Web as Corpus Meeting@eLex 2015
Herstmonceux Castle, UK