- Sentence splitting
- Tokenization
- Part-of-speech tagging and lemmatization
- Dependency parsing
LinguA allows you to analyze your texts as well as to visualize and download your analysis in CoNLL format, where:
- sentences are separated by a blank line;
- each token starts on a new line and it is annotated with the following linguistic information:
- Lemma, Coarse and Fine grained Part-of-Speech, Morphological features, Syntactic Dependency information
Click here to analyze your Italian or English text.
You can see a detailed description of Italian tagsets here: Part-of-Speech tagset, Syntactic Dependency information. For English, LinguA uses the standard Penn TreeBank tagset.
References
Dell’Orletta F. “Ensemble system for Part-of-Speech tagging“. In: Proceedings of EVALITA 2009 – Evaluation of NLP and Speech Tools for Italian 2009 (Reggio Emilia, Italy, December 2009).
Attardi G., Dell’Orletta F. “Reverse Revision and Linear Tree Combination for Dependency Parsing“. In: NAACL-HLT 2009 – North American Chapter of the Association for Computational Linguistics – Human Language Technologies (Boulder, Colorado, June 2009). Proceedings, pp. 261 – 264. Association for Computational Linguistics, 2009.
Attardi G., Dell’Orletta F., Simi M., Turian J. “Accurate Dependency Parsing with a Stacked Multilayer Perceptron“. In: Proceedings of EVALITA 2009 – Evaluation of NLP and Speech Tools for Italian 2009 (Reggio Emilia, Italy, December 2009).
(Please cite the papers above if you make use of LinguA in your research)