Projects | Italian Natural Language Processing Lab

Future Artificial Intelligence Research – FAIR

A novel eXplainable Artificial intelligence approach for low back pain phenotyping and outcome prediction: towards a patient-centric spine care model – XAI-CARE

LingUistic Complexity Evaluation in educaTion – LuCET

Teaming up with social artificial agents – TEAMING-UP

Empowering Knowledge Extraction to Empower Learners – EKEEL

The Language Of Dreams – LODE

Recent past projects

Current projects

Future Artificial Intelligence Research – FAIR.
A 24-month project (2023-2025) funded by the National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR), under the framework of the Next Generation EU (NGEU) programme.
The FAIR aims are to: i) advance frontier research in AI, both for fundamental and applied research, ii) reduce AI research fragmentation, iii) develop and adopt human-centered and trustworthy AI in the public and private sector, iv) increase AI-based innovation and development of AI technology, v) develop AI-driven policy and services in the public sector, and vi) create, retain and attract AI talent.
In the FAIR project, 4 research institutions (CNR, Bruno Kessler Foundation, INFN, and IIT), 12 universities (Politecnico di Milano, Politecnico di Torino, Sapienza, Scuola Normale Superiore, Università Campus Biomedico di Roma, Università di Bologna, Università di Pisa, Università di Trento, Università di Bari, Università della Calabria, Università di Catania, Università di Napoli “Federico II”), and 5 companies (Bracco, Expert.ai, Intesa Sanpaolo, Leonardo, Lutech) are involved.
The ItaliaNLP Lab is involved in two sub-projects of FAIR:
1) Spoke 5: HIGH-QUALITY AI, Work Package “Natural Language Generation and Text Quality Assessment”. This WP investigate computational approaches to determining, extracting, and improving the quality of text, be it machine-generated or human-created. In this context, the quality of text implies not only its form and content, but also social and ethical qualities, including all types of biases in verbal communication.
2) Transversal Project 2: Vision, Language and Multimodal Challenges. The objective of this transversal project is the development of a family of language models that have the Italian language as a first-class citizen as well as supporting multimodal inputs and outputs. The TP also covers the creation of language-only and multimodal benchmarks specifically tailored for the Italian language.

A novel eXplainable Artificial intelligence approach for low back pain phenotyping and outcome prediction: towards a patient-centric spine care model – XAI-CARE.
A 24-month project (2023-2025) funded by the National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR), under the framework of the Next Generation EU (NGEU) programme, Mission 6 – Health, in collaboration with Fondazione Policlinico Universitario Campus Bio-Medico, Università degli Studi di Sassari, Università degli studi di Messina.
This project aims to develop and demonstrate the clinical use of a novel interactive, AI-based platform for the personalized care of low back pain (LBP), a societal challenge affecting millions of people in Europe and worldwide.
The specific aims are to: 1) Validate a platform for the physician to estimate the risk of LBP onset, predict outcomes and foster personalized LBP care based on the patients phenotypes; 2) Develop a platform for patients for the evaluation of the progression of LBP and the effect of treatments; 3) Test the platforms in the real-world clinical setting and compare with the standard cares.

LingUistic Complexity Evaluation in educaTion – LuCET.
A 24-month project (2023-2025) funded by the Italian Ministry of Education, University and Research (PRIN 2022 PNRR), in collaboration with University of Venezia “Ca’ Foscari”, University of Modena e Reggio Emilia, University of Roma 3, INVALSI
LUCET investigates Linguistic Complexity (LC) in Italian and its relationship with user-based Processing Difficulty (PD). Combining expertise from theoretical, computational and applied linguistics, psycholinguistics, pedagogy and psychometrics, the project is the first attempt to offer a comprehensive and multi-level set of measures and procedures to assess LC and to provide a systematic account of its relationship with PD. The study will focus on students attending the last grade of secondary school, which represent a less investigated population than school-age children. In particular, different populations of students of grade 13 (approximately corresponding to ages 18-19) will be considered, with typical and atypical developmental profiles, and with Italian as a first or additional language.

Teaming up with social artificial agents – TEAMING-UP.
A 24-month project (2023-2025) funded by the Italian Ministry of Education, University and Research (PRIN 2022), in collaboration with University of Pisa.
The Teaming-up project aims at bringing together an interdisciplinary group of experts working in the fields of economics, computational linguistics, and bioengineering to systematically explore: 1) which types of task (e.g., mechanical vs creative) are better suited to particular types of artificial agents (e.g. with different human-likeness); 2) how should the agents communicate (verbally and non-verbally) with its human peers to increase performance; 3) How should the intelligent agents perceive and process emotions when performing the task. To this end, the project will take advantage of human-agent interactions collected during the lab and field experiments. It will rely on two social robots at E. Piaggio Center – University of Pisa: Abel, a hyper-realistic humanoid with a young aspect and a non-specific gender; and FACE, a robot with the aesthetics of a woman (both using the Social Emotional Artificial Intelligence).

Empowering Knowledge Extraction to Empower Learners – EKEEL.
A 24-month project (2023-2025) funded by the Italian Ministry of Education, University and Research (PRIN 2022 PNRR), in collaboration with the Università degli Studi di Genova, Istituto di Scienza e Tecnologie dell’Informazione (CNR) and Istituto per le Tecnologie Didattiche (CNR).
The EKEEL project is focused on knowledge extraction from educational videos in terms of concepts explained and dependency relations among them, in particular prerequisite relations, which express the knowledge required to understand another concept. The main goal of the project is the development of an innovative approach based on Natural Language Processing (NLP) and Video Processing approaches for knowledge extraction and modeling from educational videos that takes into account the analysis of the concepts’ dynamics throughout the video stream. The output of the project will enable a new wave of enhanced educational services and improve the consumption of video content.

The Language Of Dreams – LODE.
A 24-month project (2023-2025) funded by the Italian Ministry of Education, University and Research (PRIN 2022), in collaboration with Scuola IMT Alti Studi Lucca.
The primary aims of the current project include: i) Advancing our understanding of the neurophysiology underlying dreaming, specifically exploring the anatomical and functional processes that contribute to the formation and depiction of dream imagery. ii) Exploring the potential applicability of natural language processing techniques in analyzing verbal accounts of dreams, with a particular emphasis on their relevance to neurological disorders such as epilepsy, brain injuries, and neurodegenerative conditions. iii) Establishing an IT infrastructure tailored for the systematic collection, secure storage, and comprehensive analysis of both dream reports and pertinent health-related data.

Recent past projects

Reading recommendation systems tailored to the reader’s language skills and interests – LettERE (Letture pER TE).
A two-year project (2022-2024) funded by Regione Toscana (Progetti Congiunti di Alta Formazione – POR FSE 2014-2020 Investimenti a favore della crescita e dell’occupazione) in collaboration with M.E.T.A. Srl company.
The project aims to promote the practice of reading by developing a system that offers personalised reading recommendations customized to the readers’ language skills and reading preferences. These are defined based on reviews of books that readers claim to have read and enjoyed.
The goal of LettERE is to develop a web portal enriched with automatic recommendation functions, where the reader plays a central role in promoting reading by expressing their writing interests and preferences.

Book Batch One. Soluzioni per la configurazione e la produzione efficiente di prodotti editoriali personalizzati ad alto valore aggiunto – BBO.
A one year project (2021) in collaboration with M.E.T.A. Srl company, F2 Srl company and the University of Cagliari – Department of Pedagogy, Psychology, Philosophy.
The BBO project aims to introduce substantial innovations in the book and periodical publishing sector, which will allow the design, configuration and production of customised digital and physical publishing products, intended for small or very small-scale production, or even individual batch one.
The project aims to act, simultaneously and synergistically, at two levels:
Content customisation, which removes the book from pre-arranged organisational and taxonomic logics, and allows, for example, the teacher to structure a textbook that contains only and all the topics he intends to cover during the school course;
Formal customisation, both visual and interactive (digital book) and physical, in the case of the ‘paper’ book, which can be modulated in terms of layout and pagination, but also in terms of three-dimensional packaging, with infinite possibilities for shaping objects that border on art and design, for example through 3D printing techniques.

New strategies for promoting attendance in colorectal cancer screening programmes of Tuscany – BESTcc (behavioural economics, sigmoidoscopy and TC colonography) study.
A 3 year project (2020-2023) funded by Regione Toscana (Bando Ricerca Salute 2018) in collaboration with the Institute for cancer research, prevention and clinical network (ISPRO) and the Local Health Units Toscana Centro, Nord-Ovest, Sud-Est. Aim of the project is to evaluate the impact of different strategies for inviting people to colorectal cancer (CRC) screening, since attendance rate in screening programmes remains suboptimal in Italy. The underlying idea is to rely on new approaches based on “behavioural economics” concepts in order to get new insights into people’s behaviours and develop new tools to promote screening uptake.

Digital Helpers per e-Communities in Sanità – DHelp4H.
A 2 year project (2021-2022) funded by Regione Lazio (Bando POR FESR LAZIO 2014-2020) in collaboration with Webmonks Srl company, Campus Bio-Medico University of Rome (UCBM) and Institute for Cognitive Sciences and Technologies (ISTC-CNR).
Objectives of the project: i) create a software enabling biomedical engineers to rapidly develop e-Community and Digital Help web/mobile applications based on Instant Messaging and Chatbots; ii) optimize digital help algorithms with Natural Language Processing and Machine Learning techniques for the healthcare sector; iii) use the system to create an e-Community of patients suffering from Multiple Sclerosis and Chatbots able to answer both general questions on the functioning of the UCBM Polyclinic and specific questions on the treatment of fatigue through personalized electroceutical therapy.

Soluzioni innovative per la creazione, la certificazione, il riuso e la condivisione di unità didattiche digitali all’interno del sistema Scuola – SchoolChain.
A 2 year project (2018-2020) funded by Regione Toscana (Bando POR FESR 2014-2020) in collaboration with the M.E.T.A. Srl company, MATHEMA Srl company and F2 Srl company.
The aim of the project is to exploit the emerging digital technologies to relaunch the role of teachers as creators of original knowledge, enhancing the intangible but precious component of educational creativity and empathic mediation of knowledge. The project aims to define an optimized format of digital Lesson Object, which is the result of the assembly of elementary content (the Cognitive Learning Objects) and is in its turn organized into free curricular schemes, which is the product of the educational creativity of the single teacher but is equally interchangeable in a wide network of original educational resources available to the School system.

Augmented RealTime Learning for Secure workspace – ARTILS.
A 2 year project (2018-2020) funded by Regione Toscana (Bando POR FESR 2014-2020) in collaboration with 01Sistemi Srl company, CENTRICA Srl company and VIDITRUST Srl company.
The project aims at developing a platform for augmented and virtualized reality that, adding a level of information to the virtualized real context and thanks to dynamic models and simulations, is aimed at helping the work of technicians in building sites or industrial plants. ARTILS wants to allow professionals access to all intangible information (what and where: floor plans, equipment temperature, safety status, risks etc.) and procedural information (how to intervene: by activating training modules and interactive manuals on the basis of the operational need for intervention) in order to optimize industrial processes and ensure a greater safety in workplaces.

Automatic Data and documents Analysis to enhance human-based processes – ADA.
A 2 year project (2018-2020) funded by Regione Toscana (Bando POR FESR 2014-2020) in collaboration with the IT company Hyperborea s.r.l., the IT company Erre Quadro Engineering, the IT company NETRESULTS Srl, the company SUPEREVO Srl and the Multimedia Information Retrieval (MIR) group at the Institute of Information Science and Technologies (ISTI) of CNR Pisa.
The aim of this project is to develop a platform driving innovation in the production process within the framework on Industry 4.0. The platform will use technologies based on artificial intelligence and big data analysis to tackle these challenges. It will allow collection, organization and smart retrieval of information from technical text and images at all stages of production process. The main innovative functionalities provided by ADA will be: assisted document drafting, multimodal analysis including text and images, automatic extraction of information from technical documents, blockchain technology to secure certification processes, testing and predictive maintenance.

Personalizzazione di pERcorsi FORMativi Avanzati – PERFORMA.
A two-year project (2017-2019) funded by Regione Toscana (Progetti Congiunti di Alta Formazione – POR FSE 2014-2020 Asse A – Occupazione) in collaboration with Meta srl company.
The project will develop innovative methodologies for the creation and personalization of e-learning courses thanks to the integration of NLP-based functionalities aimed at assessing the adeguacy of educational materials with respect to the level of language skills of each learner’s profile and to the characteristics of different reading devices.

UBIquitous Massive Open Learning – UBIMOL.
A 2 year project (2017-2019) funded by Regione Toscana (Bando POR FESR 2014-2020) in collaboration with M.E.T.A. Srl company, 01Sistemi Srl company, VIDITRUST Srl company, PERSAFE Srl company, the CoLing Lab of the Department of Philology, Literature, and Linguistics (University of Pisa).
The project aims at developing an E-learning platform enriched with innovative technologies able to offer language courses personalized with respect to the level of language skills specific to each learner profile. Advanced Natural Language Processing techniques will enable learners to self-assess her/his developmental growth over time both in terms of the new contents learned and of the written language competences acquired during the course.

ENgaging Content Object for Reuse and Exploitation of cultural resources – ENCORE.
A 24-month project (2017-2019) in collaboration with M.E.T.A. Srl company, F2 Glocal Innovation company and the Università degli Studi di Salerno.
The project will develop a system based on an innovative approach for the production, access and reuse of cultural resources offering users personalized narrative pathways to access cultural and touristic heritage contents.

NLP-based technologies for the educational domain.
A 4-year project (2016-2019) funded by INDIRE institute (National Institute for Documentation, Innovation and Research on Education) of the the Ministry of Education.
The project aims at exploiting state-of-the-art NLP tools and Information Extraction technologies to classify, organize and semantically index the content of different typologies of documents provided by the INDIRE institute which are relevant in the educational domain (such as projects of work-related learning, reports of newly recruited teachers, etc.).

Cultural Heritage Resources Orienting Multimodal Experience – CHROME.
A 36-month project (2017-2020) funded by the Italian Ministry of Education, University and Research (PRIN 2015), in collaboration with the Università degli Studi di NAPOLI “Federico II”, Università degli Studi Roma Tre, Università degli Studi di Salerno, Istituto di Scienze Applicate e Sistemi Intelligenti “Eduardo Caianiello” (CNR).
The main output of the CHROME project will be a methodology to collect, represent and analyse cultural heritage multimodal contents and present them through artificial agents whose behaviour is inspired by accurate analysis of expert guides, museum curators and tour operators. Jointly carried out by humanists and computer scientists the project will allow to model the behaviour that gatekeepers adopt when presenting cultural heritage. Such a model will be used to control a humanoid robot designed to follow similar presentation strategies.

Voci della Grande Guerra.
A 18-month project (2016-2018) funded by Presidenza Consiglio dei Ministri in the framework of the First World War Centenary events. In collaboration with the CoPhiLab of the Institute for Computational Linguistics “A. Zampolli” (ILC), the CoLing Lab of the Department of Philology, Literature, and Linguistics (University of Pisa), Accademia della Crusca, Interuniversity Center for Historical-Military Research (University of Siena).
The project aims at building a corpus of different types of documents (letters, war bulletins, journals, diaries) to investigate how Italian people perceived and narrated the First World War and how this war contributed to change the Italian language.
The project includes: i.) the digitalization of the corpus; ii.) the development and application of NLP-based modules for event extraction and georeferencing of the war locations; iii.) the design and development of the Web search interface.

Social sensing for breakingnews – SMART NEWS.
A 2-year project (2016-2018) funded by Regione Toscana (BANDO FAR-FAS 2014) in collaboration with the IT company Hyperborea s.r.l. and the Web Application for the Future Internet (WAFI) Laboratory at the Institute of Informatics and Telematics (IIT) and the Multimedia Information Retrieval (MIR) group at the Institute of Information Science and Technologies (ISTI) of CNR Pisa.

ItaliaNLP-WAFI. Cyber Intelligence.
A 2-year project (2016-2018) funded by the Institute of Informatics and Telematics (IIT) in collaboration with Web Application for the Future Internet (WAFI) Laboratory at the Institute of Informatics and Telematics (IIT) of CNR Pisa.
The project aims at developing and adapting advanced Natural Language Processing (NLP) tools and techniques for automatic linguistic analysis and domain-knowledge extraction from social media texts in the field of Cyber Intelligence.

Collaborative Research Agreement with M.E.T.A. srl company.
A 1-year agreement (2015-2016) with M.E.T.A. srl company aimed at developing NLP-based technologies for Educational Applications.

SCRIBE – Scritture Brevi, Semplificazione Linguistica, Inclusione Sociale: Modelli e Applicazioni.
SCRIBE – Short writings, Linguistic simplification, social inclusion: models and applications.
A 3-year project (2013-2016) funded by the Italian Ministry of Education, University and Research (PRIN 2010FWM3B4 – Area 10). Project partners: Università di Tor Vergata, Università “L’Orientale” di NAPOLI, Università ROMA TRE, Università di MACERATA, Università di PISA, ILC-CNR.
The project aims at studying both from a synchronic and from a diachronic perspective the phenomenon of the synthetic and shorted messages’ production, from its contemporary expressions (short writings used for e-mails, sms and chats) to the other abbreviations’ strategies peculiar to Italian and dialectal graphic and linguistic systems. The goal of the ItaliaNLP Lab is to develop advanced computational linguistics methods for the analysis of these varieties of the Italian language.

iSLe – intelligent Semantic Liquid eBook
A 2-year project funded by Regione Toscana (POR CReO 2007 – 2013) in collaboration with IT companies (M.E.T.A SRL, 01Servizi SRL, VIDITRUST SRL, SPACE SPA).
The aim of the project is to develop an innovative software platform for digital educational publishing augmented with NLP-based functionalities for knowledge management and readability assessment.

L’amministrazione della giustizia in italia: il caso della neurogenetica e delle neuroscienze, un approccio multidisciplinare.
The Administration of Justice in Italy: the case of neurogenetics and neuroscience, a multidisciplinary approach.
Progetto Premiale MIUR. Prize Project MIUR.
One-year research CNR project (2013-2014) funded by the Italian Ministry of Education, University and Research (MIUR) proposed by Istituto di Studi Giuridici Internazionali (ISGI), Istituto di Linguistica Computazionale «Antonio Zampolli» (ILC), Istituto di Ricerche sulla Popolazione e le Politiche Sociali (IRPPS) and Istituto di Scienze e Tecnologie della Cognizione (ISTC).
The project addresses a focused set of closely-related problems at the intersection of neuroscience and criminal justice from a multidisciplinary perspective: in particular, it aims at assessing whether, and if so how, neuroscientific evidence is starting to be admitted and evaluated in individual cases in Italy. The goal of the ItaliaNLP Lab within the project is to develop advanced technologies for the textual and linguistic analysis of cases with the final aim of building a terminological resource of Neuroscience in Law.

Legal Text Mining: costruzione di reti semantico-concettuali finalizzate a una navigazione intelligente di corpora di testi giuridici (JURNET).
Legal Text Mining: building semantic networks to support advanced queries in legal textual corpora (JURNET)
A 2-year project (2013-2014) funded by Regione Toscana (POR CRO FSE 2007-2013 Asse IV – Capitale Umano). Project partners: ILC-CNR, Istituto di Teoria e Tecniche dell’Informazione Giuridica (ITTIG-CNR), Scuola Superiore Sant’Anna (Pisa), European Center for Law, Science and New Technologies (ECLT) – Università degli Studi di Pavia.
The project aims at creating the prerequisites for an advanced access to the knowledge contained in case law corpora: in particular, NLP methods and techniques will be used for building the semantic network of concepts and/or citations linking case law texts.

Collaboration ILC- Vodafone Omnitel N.V.
One year contract (2011-2012) aimed at developing and specializing NLP tools to be used within the Vodafone drafting platform Right&Clear for assessing text readability and supporting, whenever required, text simplification.

Paisà – Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati.
Paisà – Platform for Corpus-Assisted Italian Language Learning
A 3-year project (2009-2012) funded by the Italian Ministry of Education, University and Research (Firb 2007), in collaboration with University of Bologna (Project Director Sergio Scalise), ILC-CNR, University of Trento and Eurac (Bolzano). The project has built a large, freely available, richly annotated corpus of Italian, and lexical databases that will be automatically acquired from it.

PORTALITA – Piattaforma di servizi integrati per l’accesso semantico e plurilingue ai contenuti culturali italiani nel web.
PORTALITA – An integrated services platform for semantic and multilingual access to Italian cultural contents in the web
A 3-year project (2009-2012) funded by the Italian Ministry of Education, University and Research (Firb 2007, prot. RBNE07C4R9). Project partners: Dip. di Studi italianistici dell’Università di Pisa (coordinatore); Dip. di Informatica dell’Università di Pisa; Dip. di Storia delle Arti dell’Università di Pisa; Dip. di Italianistica e Spettacolo dell’Università di Roma “La Sapienza”; Consorzio ICoN – Italian Culture on the Net; Direzione Generale per i Beni Librari e gli Istituti Culturali (DGBLIC); CAP s.p.a.
The project has built a web platform including advanced tools for semantic and multilingual access to Italian cultural contents in the web, including NLP-based knowledge extraction and management tools.