SWAP - Semantic Web Access and Personalization Research Group


AlBERTo the first italian BERT model for Twitter languange understanding. Recent scientific studies on natural language processing (NLP) report the outstanding effectiveness observed in the use of context-dependent and task-free language understanding models such as ELMo, GPT, and BERT. Specifically, they have proved to achieve state-of-the-art performance in numerous complex NLP tasks such as question answering and sentiment analysis in the English language. Following the great popularity and effectiveness that these models are gaining in the scientific community, we trained a BERT language understanding model for the Italian language (AlBERTo). In particular, AlBERTo is focused on the language used in social networks, specifically on Twitter. To demonstrate its robustness, we evaluated AlBERTo on the EVALITA 2016 task SENTIPOLC (SENTIment POLarity Classification) obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets.


lesk-wsd-dsm is a software that implements a Word Sense Disambiguation algorithm based on the simple Lesk approach which integrates distributional semantics to compute the overlap between glosses.

Details about the algorithm are published in the following paper: P. Basile, A. Caputo, and G. Semeraro. An Enhanced Lesk Word Sense Disambiguation algorithm through a Distributional Semantic Model. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1591-1600.

lesk-wsd-dsm on GitHub

JIGSAW is a knowledge-based Word Sense Disambiguation system that attempts to disambiguate all words in a text by exploiting WordNet senses. The main assumption is that a specific strategy for each Part-Of-Speech is better than a single strategy.

JIGSAW is able to disambiguate both Italian and English texts. More information are available in: P. Basile, M. Degemmis, A. Gentile, P. Lops, and G. Semeraro. UNIBA: JIGSAW Algorithm for Word Sense Disambiguation. In Proceedings of the 4th ACL 2007 International Workshop on Semantic Evaluations (SemEval-2007), pages 398401.

JIGSAW on GitHub

SAWA (Similarity Algorithm based-on WikipediA) has been developed in order to suggest semantic annotations to each web service. It is able to compute text-to-text semantic similarity between phrases. This algorithm returns a value between 0 and 1: "0" means that the phrases are absolutely not similar, whereas "1" means that the phrases are completely similar. SAWA has been created as an extension of a word-to-word similarity algorithm that uses Wikipedia dump as a corpus. The algorithm is deeply optimized and it can annotate an entire web service in few seconds.

Try it!

SENSE (SEmantic N-levels Search Engine), is an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic information provided by the two other indexing levels.

OTTHO (On the Tip of my THOught) is an information seeking system designed for solving a language game which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. OTTHO implements a knowledge infusion process in order to provide a background knowledge which allows a deeper understanding of the items it deals with. The knowledge infusion process consists of two steps: 1) extracting and modeling relationships between words extracted from several knowledge sources; 2) reasoning on the induced models in order to generate new knowledge. OTTHO extracts knowledge from several sources, such as a dictionary, news, Wikipedia, and various unstructured repositories and creates a memory of linguistic knowledge and world facts. Starting from some external stimuli (e.g. words) depending on the task to be accomplished, the reasoning mechanism allows retrieving some specific pieces of knowledge from the memory created in the previous step. OTTHO has a great potential for more practical applications besides solving a language game. It could be used for implementing an alternative paradigm for associative information retrieval, for computational advertising and recommender systems.

Natural Language Processing (NLP) has a significant impact on many relevant Web-based and Semantic Web applications, such as information filtering and retrieval. Tools supporting the development of NLP applications are playing a key role in text-based information access on the Web.

META (MultilanguagE Text Analyzer) is a tool for text analysis, designed with the aim of providing a general framework for NLP tasks over different languages. The system implements both basic and advanced NLP functionalities, such as Word Sense Disambiguation.

FIRST (Folksonomy-based Item Recommender System) is a semantic content-based recommender system capable of providing recommendations for items in several domains (e.g., movies, music, books), provided that descriptions of items are available as text documents (e.g. plot summaries, reviews, short abstracts). The inceptive idea behind FIRSt is to include folksonomies in a classic content-based recommendationmodel, integrating static content describing items with dynamic user-generated content (namely tags, through social tagging of items to be recommended) in the process of learning user profiles.

Try it!

STAR (Social Tag Recommender) is a content-based tag recommender system. STaR is based on two main assumptions: 1) resources with similar content should be annotated with similar tags; 2) the previous tagging activity of users should be taken into account. The recommendation approach is based on two steps: 1) a preprocessing step, that exploits Apache Lucene engine in order to build indexes containing the information about the resources already tagged (by the user and by the community), and 2) a filtering phase, in which the system gets the most similar resources and builds a set of candidate tags by assigning to each one a score based on its occurrences in similar resources. STaR participates at the ECML/PKDD Discovery Challenge 2009.

Try it! (italian version)