META - MultilanguagE Text Analyzer

Technical Sheet

Type of software:

Prototype

Technical Description:

The system is implemented in Java.
Supported Database: MySQL ver. 4.1 or later, Microsoft Access
Other resources: WordNet 1.7.1 or WordNet 2.0

Main Features:

  • Import text from: HTML, PDF, DOC, RTF, XLS, TXT, OpenOffice
  • Multi-language architecture
  • Language-independent NLP algorithm
  • Export to: DBMS, LSI, Inverted-index, RDF, XML
  • Cross-platform architecture
NLP Operations:
  • Automatic Language Recognition
  • Text Normalization
  • Text Tokenization
  • Stop Word Elimination
  • Stemming
  • Lemmatization:
    • English: based on WordNet Default Morphological Processor
    • Italian: based on Morph-it!
  • Text Summarization (TF/IDF based)
  • POS-tagging: based on ACOPOST tagger T3 HMM - Hidden Markov Model (English/Italian)‏
  • Entity Recognition driven by Ontology
  • Named Entity Recognition (based on SVM Classifier - YAMCHA)
    • English: trained on CoNLL
    • Italian: trained on I-CAB
  • Word Sense Disambiguation (WSD):
    • Knowledge-based WSD (English/Italian)
    • Supervised Method: k-NN classifier trained on SemCor (English) and MultiSemCor (Italian)

Conditions of Use

Contact the authors.