CItA (Corpus Italiano di Apprendenti L1)

CItA (Corpus Italiano di Apprendenti L1), is the first freely available and digitalized corpus of essays written by Italian L1 learners. It was collected in 7 different lower secondary schools located in different areas of Rome: 3 schools are in the historical center and 4 schools in suburbs. The current version of the corpus contains 1,353 not-scored essays (for a total of 369,456 tokens) manually annotated for errors and corrections, but it is constantly updated. It is also accompanied by a questionnaire including 34 questions about biographical, socio-cultural and sociolinguistic background of students.
The resource was jointly compiled by the ItaliaNLP Lab and the experimental pedagogists of the “Dipartimento di Psicologia dei processi di Sviluppo e socializzazione, Università di Roma “La Sapienza”.


Click here to download the corpus.


Barbagli A., Lucisano P., Dell’Orletta F., Montemagni S., Venturi G. (2016) CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence, In Proceedings of 10th Edition of International Conference on Language Resources and Evaluation (LREC 2016), 23-28 May, Portorož, Slovenia.

(Please cite the paper above if you make use of this corpus in your research)