IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification)

IMPaCTS is the first Italian fully automatically created corpus of 1,444,160 original-simplified sentence pairs. For each input sentence, the corpus provides multiple simplifications with multiple levels of readability, and is automatically annotated with readability scores and a rich set of linguistic features. The data cover two domains: Wikipedia and administrative texts.

The resource has been developed to support multiple research purposes. On the one hand, it enables linguistic investigations of the phenomena involved in text simplification; on the other hand, it provides large-scale data for training and evaluating automatic approaches to sentence simplification.

IMPaCTS has been successfully used to fine-tune neural language models for readability-controlled sentence simplification. In particular, models trained on this resource can be conditioned to generate simplified sentences at a target readability level, expressed as a readability score.

Access the Corpus

Click here to explore and download the corpus.

Reference

Papucci M., Venturi G., Dell’Orletta F. (2026) “Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources“. In Proceedings of of the 2026 Language Resources and Evaluation (LREC 2026), 11-16 May 2026, Palma in Palma de Mallorca, Spain.

(Please cite the paper above if you make use of this corpus in your research)