It relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate representation of the linguistic information and of the domain-specific content of English and Italian text corpora in different domains.
T2K allows you to upload your texts and corpora, to store them in your repository and to manage them.
It performs the following knowledge extraction steps:
- linguistic annotation carried at increasingly complex levels of analysis, i.e. sentence splitting, tokenization, Part-Of-Speech tagging, dependency parsing;
- extraction of the linguistic profile of texts with respect to lexical, morpho-syntactic and syntactic features;
- extraction of domain-specific terminology and phrases;
- extraction of domain-specific glossary;
- organization and structuring of the set of extracted terms and phrases into taxonomical chains;
- Named Entity Recognition and Classification;
- indexing of the text with respect to the extracted terminology, phrases and Named Entities;
- extraction of relations between terms, phrases and Named Entities;
- construction and visualisation of the relation graph.
T2K highly meets your information needs:
- you can download the results of each knowledge extraction steps, correct them, and ri-upload them. This allows you to improve the results of the following step according to your corrections;
- you can modify the configurations of the information extraction processes to adapt the extracted knowledge to your information needs.
Click here for T2K (account is needed to access).
Dell’Orletta F., Venturi G., Cimino A., Montemagni S. (2014) “T2K²: a System for Automatically Extracting and Organizing Knowledge from Texts“. In Proceedings of 9th Edition of International Conference on Language Resources and Evaluation (LREC 2014), 26-31 May, Reykjavik, Iceland.
(Please cite the paper above if you make use of T2K in your research)