LinguA

LinguA is a state-of-the-art linguistic annotation pipeline which combines rule-based and machine learning algorithms. It includes the following annotation steps:

  • Sentence splitting
  • Tokenization
  • Part-of-speech tagging and lemmatization
  • Dependency parsing

LinguA allows you to analyze your texts as well as to visualize and download your analysis in CoNLL format, where:

  • sentences are separated by a blank line;
  • each token starts on a new line and it is annotated with the following linguistic information:
    • Lemma, Coarse and Fine grained Part-of-Speech, Morphological features, Syntactic Dependency information

Click here to analyze your Italian or English text.

You can see a detailed description of Italian tagsets here: Part-of-Speech tagset, Syntactic Dependency information. For English, LinguA uses the standard Penn TreeBank tagset.


References

Dell’Orletta F. “Ensemble system for Part-of-Speech tagging“. In: Proceedings of EVALITA 2009 – Evaluation of NLP and Speech Tools for Italian 2009 (Reggio Emilia, Italy, December 2009).
Attardi G., Dell’Orletta F. “Reverse Revision and Linear Tree Combination for Dependency Parsing“. In: NAACL-HLT 2009 – North American Chapter of the Association for Computational Linguistics – Human Language Technologies (Boulder, Colorado, June 2009). Proceedings, pp. 261 – 264. Association for Computational Linguistics, 2009.
Attardi G., Dell’Orletta F., Simi M., Turian J. “Accurate Dependency Parsing with a Stacked Multilayer Perceptron“. In: Proceedings of EVALITA 2009 – Evaluation of NLP and Speech Tools for Italian 2009 (Reggio Emilia, Italy, December 2009).

(Please cite the papers above if you make use of LinguA in your research)