Open Corpus Interface for Italian Language Learning

Verena Lyding, Claudia Borghetti, Henrik Dittmann, Lionel Nicolas, Egon Stemle

November 2013

Abstract

In this article, we present the multi-faceted interface to the open PAISÀ corpus of Italian. Created within the project PAISÀ (Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati) [1], the corpus is designed to be freely available for non-commercial processing, usage and distribution by the public. Hence, this automatically annotated corpus (for lemma, part-of-speech and dependency information) is exclusively composed of documents licensed under Creative Commons (CC) licenses [2].The dedicated corpus interface is designed to provide flexible, powerful, and easy-to-use modes of corpus access, with the objective to support language learning, language practicing and linguistic analyses. We present in detail the interface’s functionalities and discuss the underlying design decisions. We introduce the four principal components of the interface, describe supported display formats and present two specific features added to increase the interface’s relevance for language learning. The main search components are (1) a basic search that adopts a “Google-style” search box, (2) an advanced search that provides elaborated graphical search options, and (3) a search that makes use of the powerful CQP query language of the Open Corpus Workbench [3]. In addition, (4) a filter interface for retrieving full-text corpus documents based on keyword searches is available. It is likewise providing the means for building temporary sub-corpora for specific topics. Users can choose among different display formats for the search results. Besides the established KWIC (KeyWord In Context) and full sentence views, graphical representations of the dependency relation information as well as keyword distributions are available. These dynamic displays are based on a visualisation for dependency graphs [4] and one for Word Clouds [5], which build on latest developments in information visualisation for language data. Two special features for novice learners are integrated into each search component. The first feature is a function for restricting search results to sentences of limited complexity. Search results are automatically filtered based on formal text characteristics such as sentence length, vocabulary, etc. The second is the supply of pre-defined search queries for linguistic constructions such as sentences in passive voice, questions, etc. Finally, we show how the PAISÀ interface can be employed in different language teaching tasks. In particular, we present a complete unit of work aimed at learners of Italian (CEFR level A2/B1) and centered on students’ direct use of the interface and its functionalities. By doing so, we are giving concrete examples for targeted searches and interactions with the provided language material, as well as an exemplification of how the use of the corpus can be integrated with communicative language activities in the classroom.

Type

Conference paper

Publication

Proceedings of the International Conference ICT for Language Learning, 6th edition

Open Corpus Interface for Italian Language Learning

Abstract

Related