Terence and Teacher are two corpora of original and manually simplified texts aligned at sentence level. Terence contains 32 short Italian novels for children and their manually simplified version carried out by experts (linguists and psycholinguists) targeting children with problems in text comprehension. The corpus comprises 1036 original and 1060 simplified sentences. Teacher is a corpus of 18 pairs of Italian documents belonging to different genres (e.g. literature, handbooks) and used in educational settings. They contain both the original text and their manually simplified version mainly carried out by teachers, for a total of 266 original and 255 simplified sentences.
Click here to download the corpus.
Brunato D., Dell’Orletta F., Venturi G., Montemagni S. (2015) “Design and Annotation of the First Italian Corpus for Text Simplification“. In Proceedings of the 9th Linguistic Annotation Workshopthe (LAW’15), 5 June, Denver, Colorado, USA.
(Please cite the paper above if you make use of this corpus in your research)