The PAISÀ Corpus of Italian Web Texts

Abstract

PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.

Publication
Proceedings of the 9th Web as Corpus Workshop (WaC-9)
Next
Previous