|
Technical Sheet
Type of software:
Prototype
Technical Description:
The system is implemented in Java.
Supported Database: MySQL ver. 4.1 or later, Microsoft Access
Other resources: WordNet 1.7.1 or WordNet 2.0
Main Features:
- Import text from: HTML, PDF, DOC, RTF, XLS, TXT, OpenOffice
- Multi-language architecture
- Language-independent NLP algorithm
- Export to: DBMS, LSI, Inverted-index, RDF, XML
- Cross-platform architecture
NLP Operations:
- Automatic Language Recognition
- Text Normalization
- Text Tokenization
- Stop Word Elimination
- Stemming
- Lemmatization:
- English: based on WordNet Default Morphological Processor
- Italian: based on Morph-it!
- Text Summarization (TF/IDF based)
- POS-tagging: based on ACOPOST tagger T3 HMM - Hidden Markov Model (English/Italian)
- Entity Recognition driven by Ontology
- Named Entity Recognition (based on SVM Classifier - YAMCHA)
- English: trained on CoNLL
- Italian: trained on I-CAB
- Word Sense Disambiguation (WSD):
- Knowledge-based WSD (English/Italian)
- Supervised Method: k-NN classifier trained on SemCor (English) and MultiSemCor (Italian)
Conditions of Use
Contact the authors.
|