SWAP - Semantic Web Access and Personalization Research Group

Semeraro.PhDGraduates History

Hide minor edits - Show changes to markup

July 21, 2011, at 10:49 AM EST by 193.204.187.218 -
Added lines 7-8:

Annalina Caputo (ciclo XXIII)

Deleted lines 9-10:

Annalina Caputo (ciclo XXIII)

July 21, 2011, at 10:48 AM EST by 193.204.187.218 -
July 21, 2011, at 10:48 AM EST by 193.204.187.218 -
Changed lines 468-469 from:

(^)

to:
Changed lines 524-525 from:



\\\

to:

(^)

July 21, 2011, at 10:46 AM EST by 193.204.187.218 -
Added lines 9-10:

Annalina Caputo (ciclo XXIII)

Added lines 468-469:

(^)

Added lines 471-524:

Annalina Caputo

Semantics and Information Retrieval: Models, Techniques and Applications

Abstract

The dialogue between humans and machines takes place on two different levels. Along the path from the user’s mind to a machine representation, concepts, relationships and meanings are translated into a flat unstructured form deprived of its original meaning. This process, which also affects text representation, impacts on Information Access systems, and in particular on the Information Retrieval (IR) ones. The key concept in such systems is the word information, but when text is represented as an unordered sequence of words the retrieval task becomes a mere string matching based process. In this context, user’s vagueness and word ambiguity become a big challenge for IR systems.

Over the past decades several attempts have been proposed to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word meanings. The result is a vast area of approaches that aimed at harnessing the semantics in the Information Retrieval reins, working on two different fronts. The former tries to introduce semantics by modeling word meaning directly into document representation. The latter tries to build an ameliorated query representation by shifting from what the user asks to what the user wants. However, the general feeling is that dealing explicitly with only semantic information does not improve significantly the performance of text retrieval systems.

The work presented in this thesis explores the usage of semantics in Information Retrieval on two separate fronts: documents and queries. Semantics has many facets and several interpretations in Computer Science, but this thesis focuses on lexical semantics.

The first part of this dissertation deals with semantics in documents. Firstly, it is presented SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels that integrate (and not simply replace) the lexical level represented by keywords.

Two algorithms are proposed for representing word meanings in SENSE: the former is based on Word Sense Disambiguation, while the latter exploits Word Sense Discrimination.

The second part of this work tackles with semantics in queries. One of such approaches is the Query Expansion (QE). Two well known QE algorithms are investigated within the SENSE framework: Rocchio and the Local Context Analysis.

Lastly, this thesis faces the problem of building complex queries able to represent concepts and their relationships. Complex queries are built exploiting the quantum algebra for structured queries within the Quantum IR framework.

All proposed algorithms and approaches are evaluated on standard test collections, and results show that most of them are effective ways for improving the retrieval task. The methods presented in this thesis demonstrate as the point in question is how, rather than whether, add semantics to IR.


Added line 526:

\\\

May 26, 2011, at 07:14 AM EST by 193.204.187.33 -
Changed lines 5-21 from:

Pierpaolo Basile ciclo XXI

Marco de Gemmis ciclo XVI

Anna Lisa Gentile ciclo XXII

Leo Iaquinta ciclo XXII

Oriana Licchelli ciclo XVII

Pasquale Lops ciclo XVI

Ignazio Palmisano ciclo XIX

Domenico Redavid ciclo XX

Eufemia Tinelli ciclo XXI

to:

Pierpaolo Basile (ciclo XXI)

Marco de Gemmis (ciclo XVI)

Anna Lisa Gentile (ciclo XXII)

Leo Iaquinta (ciclo XXII)

Oriana Licchelli (ciclo XVII)

Pasquale Lops (ciclo XVI)

Ignazio Palmisano (ciclo XIX)

Domenico Redavid (ciclo XX)

Eufemia Tinelli (ciclo XXI)

May 26, 2011, at 07:13 AM EST by 193.204.187.33 -
Changed line 50 from:
to:
Changed line 79 from:
to:
Changed line 136 from:
to:
Changed line 169 from:
to:
Changed line 252 from:
to:
Changed line 271 from:
to:
Changed line 330 from:
to:
Changed line 389 from:
to:
May 26, 2011, at 07:11 AM EST by 193.204.187.33 -
Changed line 50 from:

to:
Changed line 79 from:

to:
Changed line 136 from:

to:
Changed line 169 from:

to:
Changed line 252 from:

to:
Changed line 271 from:

to:
Changed line 330 from:

to:
Changed line 389 from:

to:
May 26, 2011, at 07:08 AM EST by 193.204.187.33 -
Added line 50:

Deleted line 51:

Added line 79:

Deleted line 80:

Added line 136:

Deleted line 137:

Added line 169:

Deleted line 170:

Added line 252:

Deleted line 253:

Added line 271:

Deleted line 272:

Added line 330:

Deleted line 331:

May 26, 2011, at 07:06 AM EST by 193.204.187.33 -
Added line 389:

Deleted line 390:

May 26, 2011, at 07:04 AM EST by 193.204.187.33 -
Added lines 22-23:
May 26, 2011, at 07:03 AM EST by 193.204.187.33 -
Deleted lines 22-26:

\\\

Added line 25:
May 26, 2011, at 07:02 AM EST by 193.204.187.33 -
Added lines 5-6:

Pierpaolo Basile ciclo XXI

Changed lines 8-12 from:

Pasquale Lops ciclo XVI

to:

Anna Lisa Gentile ciclo XXII

Leo Iaquinta ciclo XXII

Added lines 14-16:

Pasquale Lops ciclo XVI

Added line 18:
Changed line 20 from:

Pierpaolo Basile ciclo XXI

to:
Deleted lines 21-22:

Anna Lisa Gentile ciclo XXII Leo Iaquinta ciclo XXII

May 26, 2011, at 07:00 AM EST by 193.204.187.33 -
Added lines 4-13:


Marco de Gemmis ciclo XVI Pasquale Lops ciclo XVI Oriana Licchelli ciclo XVII Ignazio Palmisano ciclo XIX Domenico Redavid ciclo XX Pierpaolo Basile ciclo XXI Eufemia Tinelli ciclo XXI Anna Lisa Gentile ciclo XXII Leo Iaquinta ciclo XXII

Added lines 15-19:

\\\

Changed line 23 from:
to:

Changed lines 45-47 from:

Paquale Lops

to:

Pasquale Lops

Added line 74:

Changed lines 81-128 from:
to:

The rapid evolution of Internet services has led to a constantly increasing number of web sites and to an increase in the available information. Today, the main challenge is to support web users in order to facilitate navigation through web site and to improve searching among the extremely large web repositories, such as Digital Libraries or other generic information sources. Personalization, a possible approach to the problem, involves techniques and mechanisms to reduce this information overload and facilitates the delivery of relevant information that has been personalized for the preferences of individual users. Machine Learning techniques have a significant role to play in the development of personalized services within the Digital Libraries. For example, many Machine Learning techniques are well suited for transforming user-activity data into useful preference rules as part of a user profile. In web systems, the user profiles manipulate information that refers to user knowledge in a domain, to her/his personality, her/his preferences, or to any other information on the user that can be useful in the configuration of an application.

This thesis explores the role of user profiles in web applications such as Bookshop Online, Digital Libraries and e-Learning. In particular, it is analyzed the possibility to enlarge the availability of teaching materials provided by an e-learning system reusing materials existing in external sources, such as digital libraries. Therefore, the major research topic addressed by the thesis is related to improve the search of educational materials on the Web. Looking at it from an educational perspective, related questions include: What type of search tool to provide to the students to assist them in their search for course related materials on the Web? Should it leave the students in control of their search strategy or should it use a meta-search-like automatic modification of their search queries? During a search session in an e-learning system, the learner can obtain a sequence that can help motivate her/him to learn and prevent her/him from being frustrated. This sequence is the result of a search modified on the ground of the information contained in the student model which describes the preferences, needs and interests of the student and her/his learning performance.

This thesis describes the design and implementation of a personalization system (Profile Extractor), that analyzes the data coming out from the interaction between the users and the web application to automatically discover, using Machine Learning techniques, the user preferences, needs and interests. Moreover, it shows the possible uses of the user profiles created by the Profile Extractor system in two domains: bookshop online, and digital libraries, where several different experiments have been carried out in order to measure the efficiency of the user profiles. These two domains have been used as test beds for the implemented techniques and, since the results of the experiments have been encouraging, these techniques have been applied in the areas of the Student Modelling, that is the adoption of user profiles in the e-learning domain. Several experiments have been carried out for checking the efficiency of the user profiles in this context and for comparing the effectiveness of the numeric algorithms implemented by Profile Extractor system with the symbolic ones, from the area of Inductive Logic Pragramming, implemented by another system, along with an evaluation of their efficiency in order to decide how to best exploit them in the induction of student profiles for future works.

Added line 131:

Added line 164:

Changed line 175 from:

and other equipment (e.g.: networked scienti c instruments for e-Science or

to:

and other equipment (e.g.: networked scientific instruments for e-Science or

Changed line 186 from:

di erent organizations o er. The emergence of Web Service technology allowed this

to:

different organizations offer. The emergence of Web Service technology allowed this

Changed lines 192-193 from:

entities grounded within the real world (such as those o ered by network or utility providers) that may o er some provision of value in some domain. As the

to:

entities grounded within the real world (such as those offered by network or utility providers) that may offer some provision of value in some domain. As the

Changed line 199 from:

ow de nition languages such as

to:

of definition languages such as

Changed line 213 from:

These de nitions can be applied both to Web Services applications and to agentbased

to:

These definitions can be applied both to Web Services applications and to agent-based

Changed lines 216-217 from:

following analogy illustrates these concepts and their di erences:

to:

following analogy illustrates these concepts and their differences:

Changed lines 220-222 from:

From the Web service perspective, an orchestration is a declarative speci - cation that describes a work ow to support the execution of a speci c business

to:

From the Web service perspective, an orchestration is a declarative specification that describes a work to support the execution of a specific business

Changed line 225 from:

orchestrate in a set of services, we need to be able to nd, select, combine and

to:

orchestrate in a set of services, we need to be able to find, select, combine and

Changed line 230 from:

ofWeb Service remaining entirely in the SemanticWeb sphere. In particular,

to:

of Web Service remaining entirely in the SemanticWeb sphere. In particular,

Added line 244:
Added line 247:

Added line 266:

Deleted line 300:
Added line 325:

Added line 384:

October 01, 2010, at 04:26 AM EST by 193.204.187.101 -
Changed line 244 from:

This thesis aims to advance the state of the art in research on efficient reasoning

to:

This thesis aims at advancing the state of the art in research on efficient reasoning

Changed lines 250-253 from:

The contributions of this research could be summarized as follows:

  • We describe a preliminary matchmaking approach which investigates instance modeling in a relational database. It is not dependent on

the domain and it implements several match classes, exploiting SQL standard only. Moreover, two ontologies modeling different domains are used to built several datasets of instances in order to better verify services performance.

  • On the basis of results of the above mentioned approach, we present a complete matchmaking algorithm specially suitable for skill matching. Distinguishing features include: the possibility to express both strict requirements and preferences in the user request, a logic-based ranking of retrieved instances and the explanation of rank results. All services only rely on ad hoc queries translated in standard SQL: no built-in operator and/or new constructor are exploited. In this approach, both requests and offers must be expressed using the same reference template which is necessary to define their structure and expressiveness.
to:

The contribution of this research could be summarized as follows:

  • We describe a preliminary matchmaking approach which investigates instance modeling in a relational database. It is not dependent on the domain and it implements several match classes, exploiting SQL standard only. Moreover, two ontologies modeling different domains are used to built several datasets of instances in order to better verify services performance.
  • On the basis of results of the above mentioned approach, we present a complete matchmaking algorithm specially suitable for skill matching. Distinguishing features include: the possibility to express both strict requirements and preferences in the user request, a logic-based ranking of retrieved instances and the explanation of rank results. All services only rely on ad hoc queries translated in standard SQL: no built-in operator and/or new constructor are exploited.
October 01, 2010, at 04:06 AM EST by 193.204.187.101 -
Changed line 210 from:

In order to offer a powerful and automated retrieval process, the simple keywordbased

to:

In order to offer a powerful and automated retrieval process, the simple keyword-based

Changed lines 212-213 from:

process can be time-consuming and unsatisfactory. Generally speaking, those systems are keyword-based and then a user can express only her mandatory requirements (there

to:

process can be time-consuming and unsatisfactory. Generally the user can express only her mandatory requirements (there

Changed line 214 from:

systems return not ranked and often irrelevant results without explanations. The efficiency

to:

systems return often irrelevant results without explanations. The efficiency

Changed lines 216-221 from:

frameworks able to perform the match among user requests and offers. From this point of view, it is noteworthy that non-logical approaches to resource retrieval and matchmaking have serious limitations. For example, by exploiting standard relational database techniques to model a resource retrieval framework, there is the need to completely align the attributes of the offered and requested resources, in order to perform a match. If requests and offers are simple names or strings, the only possible

to:

frameworks able to perform the match among user requests and offers. If requests and offers are simple names or strings, the only possible

Deleted lines 220-224:

Moreover, in real contexts, very often there are no offers that are better than the others ones from every user selection criteria. We consider that in these cases, i.e., when exact matches are lacking, instead of receiving an empty set as search result, user could accept worse alternatives gradly or she could negotiate the original requirements for compromises.

Changed line 230 from:

The final goal is to retrieve only the best offers, opportunely ranked, w.r.t. the user

to:

The final goal is to retrieve only the best offers, opportunely ranked, with respect to the user

Changed lines 233-239 from:

The problemof reasoning efficiency is not new in literature. Knowledge Compilation, infact, is a technique exploited for making reasoning computationally easier in a knowledge base (KB) typicallymodelled using a logical formalism. The idea of knowledge compilation is to split query answering into two phases:

  • in the first one the knowledge base is preprocessed, thus obtaining an appropriate data structure (such a phase is sometimes called off-line reasoning);
  • in the second phase, the query is actually answered using the output of the first phase (such a phase is sometimes called on-line reasoning).
to:
Changed lines 236-237 from:

classical relational database systems (RDBMS) and languages i.e., SQL, for storing the KB and to perform reasoning tasks. Several approaches have been presented in which

to:

classical relational database systems (RDBMS) and languages i.e., standard SQL, for storing the KB and to perform reasoning tasks respectively. Several approaches have been presented in which

Changed line 241 from:

with this work approaches will be discussed because the problem of preference

to:

with the approaches of this work will be discussed because the problem of preference

Changed line 246 from:

intends to show how appropriatemodeling of the KB can improve semantic matchmaking

to:

intends to show how appropriate modeling of the KB can improve semantic matchmaking

Changed lines 248-255 from:

aspects and dimensions:

  • Application – For what is resource retrieval used? And resource composition? Application fields are several and different.
  • Efficient semantic matchmaking – What language is necessary to build both user request and offer semantic description? What data structure enables to perform reasoning tasks? How can we evaluate the reasoning efficiency?
    • KB modeling – How is it possible to respect an Open-world Assumption by means of an RDBMS based on the Closed-world Assumption? Other issue is related to stored information useful for retrieval. In other terms, we discuss which data (structured and not, instances and ontological inforation) have to be stored in order to provide services such as matchmaking, ranking and match explanation.
    • Match classes – Which match classes are allowed? Which algorithms are implemented? Is the system scalable in the sense that the retrieval time quite linearly increases with the data size?
    • Complementary facilities – Which information is used to explain the score obtained for each result? For the end user is important to express both necessary requirements and desiderable ones in her request. Hence, the matchmaker have to be able to deal efficiently with strict and soft constraint, respectively.
to:

aspects and dimensions.

Changed lines 251-254 from:
  • We describe a preliminary matchmaking approach which investigates instance modeling in a relational database1. It is domain independent and it implements several match classes, exploiting SQL standard only. Limits are the followings: no ranked list of results is returned and the potential match is not complete because generally it retrieves a bigger set of results containing irrelevant results also. Moreover, two ontology modeling different domains are used to built several datasets of instances in order to better verify services performance.
  • On the basis of results of the above mentioned approach, we present a complete matchmaking algorithmspecially suitable for skill matching. Distinguishing features include: the possibility to express both strict requirements and preferences in the user request, a logic-based ranking of retrieved instances and the explanation of rank results. All services only rely on ad hoc queries translated in standard SQL: no built-in operator and/or new constructor are exploited. In this approach, both requests and offers must be expressed using the same reference template which is necessary to define their structure and expressivity.
  • As proof-of-concept, we present a tool developed for providing skill matching and team-work composition. A main aim is the design of an user-friendly GUI both for browsing easily the domain ontology (in order to compose the query) and for better explain the retrieved results.
  • We describe other efficient techniques for resource retrieval and composition in domains as Ubiquitous Computing and Business Process. We present a possible integration between semantic matchmaking services and user profiling ones and, finally, we investigate the problem of core competence extraction.
to:
  • We describe a preliminary matchmaking approach which investigates instance modeling in a relational database. It is not dependent on

the domain and it implements several match classes, exploiting SQL standard only. Moreover, two ontologies modeling different domains are used to built several datasets of instances in order to better verify services performance.

  • On the basis of results of the above mentioned approach, we present a complete matchmaking algorithm specially suitable for skill matching. Distinguishing features include: the possibility to express both strict requirements and preferences in the user request, a logic-based ranking of retrieved instances and the explanation of rank results. All services only rely on ad hoc queries translated in standard SQL: no built-in operator and/or new constructor are exploited. In this approach, both requests and offers must be expressed using the same reference template which is necessary to define their structure and expressiveness.
  • As proof-of-concept, we present a tool developed for providing skill matching and team-work composition. The main issue is to design an user-friendly GUI both for browsing easily the domain ontology (in order to compose the query) and for better explain the retrieved results.
  • We describe other efficient techniques for resource retrieval and composition in domains such as Ubiquitous Computing and Business Process. Moreover, we present a possible integration between semantic matchmaking services and user profiling ones and, finally, we investigate the problem of core competence extraction.
September 28, 2010, at 05:07 AM EST by 193.204.187.33 -
Added lines 201-277:

Efficient Reasoning Techniques for Large Datasets of DLs Instances: Approaches And Applications

Abstract

Nowadays more and more people choose to employ Internet and/or automated procedures as infastructure and means for communication, search and resource repository. In this context, both services and goods are considered resource. The main aim is to provide new business opportunities allowing a more efficient management of information.

In order to offer a powerful and automated retrieval process, the simple keywordbased search is not sufficient. Infact, in the on-line websites and portals the search process can be time-consuming and unsatisfactory. Generally speaking, those systems are keyword-based and then a user can express only her mandatory requirements (there is no possibilities to select features according to wishes or negotiable constraints). Such systems return not ranked and often irrelevant results without explanations. The efficiency of such retrieval engine is therefore determined by the efficacy of their underlying frameworks able to perform the match among user requests and offers. From this point of view, it is noteworthy that non-logical approaches to resource retrieval and matchmaking have serious limitations. For example, by exploiting standard relational database techniques to model a resource retrieval framework, there is the need to completely align the attributes of the offered and requested resources, in order to perform a match. If requests and offers are simple names or strings, the only possible match would be identity, resulting in an all-or-nothing outcome. On the other hand, pure knowledge-based approaches require heavy computational capabilities, hence response times are often unacceptable.

Moreover, in real contexts, very often there are no offers that are better than the others ones from every user selection criteria. We consider that in these cases, i.e., when exact matches are lacking, instead of receiving an empty set as search result, user could accept worse alternatives gradly or she could negotiate the original requirements for compromises. In business scenarios, other important issue is to deal with very large datasets of resources. Hence, the retrieval efficiency is measured both by data scalability and by parameters such as allowed match classes, obtained relevant results, ranking functions and query language expressivity. Of course, in this work is not possible to cover the full range of reasoning services. Instead, thesis focus is the presentation of efficient resource matchmaking and composition approaches in several business context and, in particular, the contribution that Knowledge Representation (KR), specifically Description Logics (DLs), can provide to improve scenarios where demand (user request) meets offers (good and services). The final goal is to retrieve only the best offers, opportunely ranked, w.r.t. the user request.

The problemof reasoning efficiency is not new in literature. Knowledge Compilation, infact, is a technique exploited for making reasoning computationally easier in a knowledge base (KB) typicallymodelled using a logical formalism. The idea of knowledge compilation is to split query answering into two phases:

  • in the first one the knowledge base is preprocessed, thus obtaining an appropriate data structure (such a phase is sometimes called off-line reasoning);
  • in the second phase, the query is actually answered using the output of the first phase (such a phase is sometimes called on-line reasoning).

Matchmaking approaches presented in this work are based on KB pre-processing in order to reduce on-line reasoning. A relevant aspect of thesis work is the exploitation of classical relational database systems (RDBMS) and languages i.e., SQL, for storing the KB and to perform reasoning tasks. Several approaches have been presented in which databases allow users and applications to access both ontologies and other structured data in a seamless way. An overview and a comparison among these will be also presented. Finally, preference-based models and systems sharing some characteristics with this work approaches will be discussed because the problem of preference handling in RDBMS is not new in information retrieval systems.

This thesis aims to advance the state of the art in research on efficient reasoning techniques for managing very large datasets of DLs instances; in particular, the work intends to show how appropriatemodeling of the KB can improve semantic matchmaking and, eventually, match explanation. The discussion will take into account several aspects and dimensions:

  • Application – For what is resource retrieval used? And resource composition? Application fields are several and different.
  • Efficient semantic matchmaking – What language is necessary to build both user request and offer semantic description? What data structure enables to perform reasoning tasks? How can we evaluate the reasoning efficiency?
    • KB modeling – How is it possible to respect an Open-world Assumption by means of an RDBMS based on the Closed-world Assumption? Other issue is related to stored information useful for retrieval. In other terms, we discuss which data (structured and not, instances and ontological inforation) have to be stored in order to provide services such as matchmaking, ranking and match explanation.
    • Match classes – Which match classes are allowed? Which algorithms are implemented? Is the system scalable in the sense that the retrieval time quite linearly increases with the data size?
    • Complementary facilities – Which information is used to explain the score obtained for each result? For the end user is important to express both necessary requirements and desiderable ones in her request. Hence, the matchmaker have to be able to deal efficiently with strict and soft constraint, respectively.

The contributions of this research could be summarized as follows:

  • We describe a preliminary matchmaking approach which investigates instance modeling in a relational database1. It is domain independent and it implements several match classes, exploiting SQL standard only. Limits are the followings: no ranked list of results is returned and the potential match is not complete because generally it retrieves a bigger set of results containing irrelevant results also. Moreover, two ontology modeling different domains are used to built several datasets of instances in order to better verify services performance.
  • On the basis of results of the above mentioned approach, we present a complete matchmaking algorithmspecially suitable for skill matching. Distinguishing features include: the possibility to express both strict requirements and preferences in the user request, a logic-based ranking of retrieved instances and the explanation of rank results. All services only rely on ad hoc queries translated in standard SQL: no built-in operator and/or new constructor are exploited. In this approach, both requests and offers must be expressed using the same reference template which is necessary to define their structure and expressivity.
  • As proof-of-concept, we present a tool developed for providing skill matching and team-work composition. A main aim is the design of an user-friendly GUI both for browsing easily the domain ontology (in order to compose the query) and for better explain the retrieved results.
  • We describe other efficient techniques for resource retrieval and composition in domains as Ubiquitous Computing and Business Process. We present a possible integration between semantic matchmaking services and user profiling ones and, finally, we investigate the problem of core competence extraction.
September 24, 2010, at 05:36 AM EST by 193.204.187.33 -
Added lines 206-258:

Entities and Identities: Named Entity Processing with cultural Knowledge

Abstract

Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. Reading a written text implies the comprehension of the information that words are carrying. Comprehension is an intrinsic capacity for a human, but not for a machine. One of the ultimate roles of Natural Language Processing techniques is identifying the meaning of the text, providing effective ways to make a proper linkage between textual references and real world objects, thus enabling machines to have a bit of the understanding which is proper of a human.

A proper name is a word or a list of words that refers to a real world object. Linguistic Expressions with the same reference may have different senses, so it is necessary to disambiguate between them.

Natural Language Processing (NLP) operations include text normalization, tokenization, stop words elimination, stemming, Part Of Speech tagging, lemmatization. Further steps, such as Word Sense Disambiguation (WSD) or Named Entity Recognition (NER), are aimed at enriching texts with semantic information. Named Entity Disambiguation (NED) is the procedure that solves the correspondence between real-world entities and mentions within text. One of the ultimate goals of NLP techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and real world objects. The thesis addresses the problem of giving a sense to proper names in a text, that is the problem of automatically associating words representing Named Entities with their identities, that is unique real world objects. Also, the thesis copes with the problem of lack of training and testing data for such a task.

Proposed approaches automatically associate each entity in a text with a unique identifier, a URI from Wikipedia, which is used as an "entity-provider".

The main contribution consists of proposing knowledge based approaches for NED, which do not requires training data. Specifically the thesis proposes two solutions:

  • a completely knowledge-based algorithm for NED, exploiting Wikipedia data
  • a Semantic Relatedness (SR) approach for the NED task: SR scores are obtained by a graph-based model over Wikipedia

The first solution has been tested for italian language: due to lack of italian testing data for such task, the thesis shows a method to automatically build a testbed dataset from Wikipedia. The second solution has been tested over an goldstandard dataset for NED: the proposed algorithm achieves results competitive with the state of the art.

Both suggested solutions are completely knowledge-based, with the advantage that no training data is needed: indeed, manually annotated data for this task is not easily available and acquiring such data can be expensive.

September 24, 2010, at 05:24 AM EST by 193.204.187.33 -
Changed lines 152-162 from:
Consider a dance with more than one dancer. Each dancer has a set of

steps that he will perform; they orchestrate their own steps because they are in complete control of their domain (their body). A choreographer ensures that the steps all of the dancers make are according to some overall scheme; we call this a choreography. The dancers have a single view point of the dance, while the choreography has a multi-party or global view point of the dance. Orchestration is about describing and executing a single view point model, while choreography is about describing and guiding a global model. It is possible to derive the single view point model from the global model by projecting based on participant.

to:
Consider a dance with more than one dancer. Each dancer has a set of steps that he will perform; they orchestrate their own steps because they are in complete control of their domain (their body). A choreographer ensures that the steps all of the dancers make are according to some overall scheme; we call this a choreography. The dancers have a single view point of the dance, while the choreography has a multi-party or global view point of the dance. Orchestration is about describing and executing a single view point model, while choreography is about describing and guiding a global model. It is possible to derive the single view point model from the global model by projecting based on participant.
Changed lines 161-162 from:

perform them. Therefore, it is needed to realize the following use cases: Discovery, Selection, Composition and Invocation.

to:

perform them. Therefore, it is needed to realize the following use cases: Discovery, Selection, Composition and Invocation.

September 24, 2010, at 05:22 AM EST by 193.204.187.33 -
Changed line 152 from:

''Consider a dance with more than one dancer. Each dancer has a set of

to:
Consider a dance with more than one dancer. Each dancer has a set of
Changed line 161 from:

model from the global model by projecting based on participant.''

to:

model from the global model by projecting based on participant.

September 24, 2010, at 05:21 AM EST by 193.204.187.33 -
Added lines 101-186:

Towards the Orchestration of Semantic Web Services

Abstract

The evolution and ubiquity of the Internet has facilitated the proliferation of distributed resources, such as computer systems and software applications. Organizations are increasingly utilizing resources that span traditional organizational boundaries, like shared databases or processor farms, to share expensive computing resources and other equipment (e.g.: networked scienti c instruments for e-Science or cyber infrastructure projects) or to pool together Enterprise resources distributed widely across networks and geographies. Software applications are also evolving from monolithic, stove-pipe applications to loosely federated, interacting services that are dependent on networked resources to provide optimal functionality. This evolution, powered by the dot-com bubble at the turn of the century, emerged to automate and outsource business processes to a worldwide audience, both at the Business-to-Business (B2B) level and for Business-to-Customer (B2C) applications one improving the user experience. This new software engineering approach enabled distributed, heterogeneous software components to communicate and interoperate, through declarative, machine-readable descriptions of the services that di erent organizations o er. The emergence of Web Service technology allowed this migration for both enterprise and Grid-based applications due to its exploitation of the near ubiquitous World-Wide-Web infrastructure, cross-platform interoperability, and the fact that it is built upon de facto Web standards for syntax, addressing, and communication protocols. Several conceptualizations of a service have been proposed, ranging from electronic services that facilitate B2B e-commerce, to business entities grounded within the real world (such as those o ered by network or utility providers) that may o er some provision of value in some domain. As the provision of these services has moved from a developer driven mechanism to one involving automatic runtime selection (requiring service discovery support), the descriptions of the APIs and protocols have been increasingly declarative. The Web Service paradigm introduced the concept of homogeneous, XML-based representation of service descriptions using interface and work ow de nition languages such as WSDL, BPEL4WS, and WS-Choreography. Nevertheless, whilst these approaches facilitated easier access and usage of web services for developers, they have failed to address many of the knowledge-based problems associated with the diversity of service providers, i.e. interface and data heterogeneity. Semantic Web Services (SWS) address this problem by providing a declarative, ontological framework for describing services, messages, and concepts in a machine-readable format that can also facilitate logical reasoning. Thus, service descriptions can be interpreted based on their meanings, rather than simply being a symbolic representation. Semantic Web Services aim to extend the Web Service integration process in order to facilitate automated (or semi-automated) composition, discovery, dynamic binding, and invocation of services within open, scalable environments. Where there is need to use a SWS infrastructure without human intervention, two very important concepts can be used to describe the issues to be solved: Orchestration and Choreography. These de nitions can be applied both to Web Services applications and to agentbased systems, and in general to any system for which the notion of collaboration and planning makes sense, i.e. systems including more than one active entity. The following analogy illustrates these concepts and their di erences:

''Consider a dance with more than one dancer. Each dancer has a set of steps that he will perform; they orchestrate their own steps because they are in complete control of their domain (their body). A choreographer ensures that the steps all of the dancers make are according to some overall scheme; we call this a choreography. The dancers have a single view point of the dance, while the choreography has a multi-party or global view point of the dance. Orchestration is about describing and executing a single view point model, while choreography is about describing and guiding a global model. It is possible to derive the single view point model from the global model by projecting based on participant.''

From the Web service perspective, an orchestration is a declarative speci - cation that describes a work ow to support the execution of a speci c business processes, operation or service; i.e., it describes how Web Services can interact with each other at the message level, including the business logic and execution order of their interactions. On the contrary, from a SWS perspective, in order to automatically orchestrate in a set of services, we need to be able to nd, select, combine and perform them. Therefore, it is needed to realize the following use cases: Discovery, Selection, Composition and Invocation.

The aim of this thesis is the formalization of some aspects inherent the orchestration ofWeb Service remaining entirely in the SemanticWeb sphere. In particular, it discusses an in-depth analysis of the Semantic Web Services from the orchestration perspective, including the state of the art and the comparison between the most widely adopted SWS representation languages. A solution for the SWS Composition use case is presented. A prototype based on a backward chaining algorithm has been implemented using SWRL (Semantic Web Rule Language) as representation language for OWL-S services. It is an original solution since it has been realized entirely using Semantic Web technologies. Furthermore, there are two Semantic Web problems that unavoidably impact the development of Semantic Web Services: ontology alignment and monotonic knowledge base management. These issues are also discussed and some possible solutions developed in the Semantic Web context are proposed for the Semantic Web Services. Finally, an empirical evaluation of our prototype and the conclusion is presented.

September 24, 2010, at 05:11 AM EST by 193.204.187.33 -
Added line 72:
Added line 109:
September 24, 2010, at 05:10 AM EST by 193.204.187.33 -
Changed lines 60-64 from:
to:

Personalization in Digital Libraries for Education

Abstract

Added lines 69-94:

A Machine Learning Approach to Ontology Alignment

Abstract The main problem that this thesis is aimed to address is Ontology Alignment. It can be described as the problem of how to move knowledge between different possible representations or formalizations, not in the sense of different knowledge representation formalisms, but in the sense of different conceptualizations within the same expression language and the same knowledge domain. Conceptualization here is intended as ontology where the key espression is "An ontology is a formalization of a conceptualization". The term ontology is borrowed from phylosopy, where its sense is "discourse about being"; in current Computer Science, its meaning could be described as "formalization of relationships between entities, both physical and abstract ones".

In this work, my aim is to use Machine Learning techniques applied to ontologies expressed in Description Logics (DL) formalisms in order to solve some of the issues that arise in trying to address an Ontology Alignment problem; Decription Logics (a family of knowledge representation formalisms aimed at describing knowledge as concepts and relations between concepts and entities - abstract or real world ones) have been chosen as logic foundation for many languages that the W3 Consortium has endorsed, in particular those involved in the realization of the Semantic Web, the evolution of current web that aims at formally capturing the knowledge expressed by web contents, so that automatic reasoning can be applied to accomplish a wide variety of tasks, to name a few: increase search effectiveness, simplify data migration, automatize knowledge exchange between systems, enhance automatic service discovery and service composition planning.

September 21, 2010, at 04:52 AM EST by 193.206.186.106 -
Added lines 75-86:

Word Sense Disambiguation and Intelligent Information Access

Abstract In the field of computational linguistics, researchers are mainly concerned with the computational processing of natural language. A number of results have already been obtained, ranging from concrete and applicable systems able to understand or produce language to theoretical descriptions of the underlying algorithms. However, a number of important research problems have not been solved. A particular challenge for computational linguistics pertaining to all levels of language is ambiguity. Most people are quite unaware of how vague and ambiguous human languages really are, and they are disappointed when computers are hardly able to understand language and linguistic communication the way humans do. Ambiguity means that a word can be interpreted in more than one way, has more than one meaning. Mostly ambiguity does not pose a problem for humans and is therefore not perceived as such, for a computer, however, ambiguity is one of the main problems encountered in the analysis and generation of natural languages.

Moreover, advances in the Internet and the creation of huge stores of digitized text have opened the gateway to a deluge of information that is difficult to navigate. Although the information is widely available, exploring Web sites and finding information relevant to a user needs is a challenging task. One of the obstacle is represented by the language ambiguity, for example if you want to search all the documents about bat as a small nocturnal creature, most probability the retrieval system gets back also the documents that contains bat as a piece of sport equipment (a club used for hitting a ball in various games). In order to solve this problem, a method able to disambiguate word meanings across the documents is needed.

The central argument of this dissertation is the use of Word Sense Disambiguation for Intelligent Information Access. Word Sense Disambiguation (WSD) refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct sense to a word used in a given context, while Intelligent Information Access is a user-centric and semantically rich approach to access information.

After a brief introduction to the problem of lexical semantic ambiguity, I propose several methods for word sense disambiguation that attempts to disambiguate words exploiting a semantic knowledge resource like WordNet. WordNet is an lexical database whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets synset, each representing one underlying lexical concept.

The second part of the dissertation takes into account the evaluation of WSD strategies. In particular the proposed algorithms are tested independently of any application, using specially constructed benchmarks and after they are evaluated in terms of their contribution to the overall performance of a system designed for Intelligent Information Access such as Semantic Searching and Intelligent User Profiling.

September 21, 2010, at 04:48 AM EST by 193.204.187.140 -
Changed lines 11-27 from:
to:

Learning User Profiles from Text for Personalized Information Access

Abstract

Advances in the Internet and the creation of huge stores of digitized text have opened the gateway to a deluge of information that is difficult to navigate. Although the information is widely available, exploring Web sites and finding information relevant to a user's interests is a challenging task. The first obstacle is research, where you must first identify the appropriate information sources and then retrieve the relevant data. Then, you have to sort through this data to filter out the unfocused and unimportant information. Lastly, in order for the information to be truly useful, you must take the time to figure out how to organize and abstract it in a manner that is easy to understand and analyze. To say the least, all of these steps are extremely time consuming. This "relevant information problem" leads to a clear demand for automated methods able to support users in searching large document repositories in order to retrieve relevant information with respect to their preferences. Catching user interests and representing them in a structured form is a problematic activity. Algorithms designed for this purpose base their relevance computations on so-called user profiles in which representations of the users' interests are maintained. The central argument of this dissertation is the use of Supervised Machine Leaning techniques to induce user profiles from text data for Intelligent Information Access. Intelligent Information Access is a user-centric and semantically rich approach to access information: information preferences vary greatly across users, therefore information access must be highly personalized by profiles to serve the individual interests of the user. Moreover, users want to retrieve information on the basis of conceptual content, but individual words provide unreliable evidence about the meaning of documents. Thus, methods for extracting meaning from documents must be considered in order to effectively find relevant information.

First, we describe content-based learning algorithms designed to learn about users' interests. The input, given as a set of text documents marked by the user as relevant or not relevant, is used to find characteristics that distinguish relevant documents from irrelevant ones. The induced target concept is a user profile appropriate for the classification of new documents. Documents are represented as bag of words (BOW): a document is encoded as a feature vector, with each element in the vector indicating the presence or absence of a word in the document. This approach was used as a baseline to determine how well a standard keyword-based learner performs on this task.

Second, current limits in the state of the art in profiles generated from the BOW-represented documents are analyzed. Though many linguistic techniques have been employed, there are problems that still remain unsolved like: polysemy, synonymy, etc. A possible solution for this kind of issues is explored: the shift of the level of abstraction from words up to concepts. Profiles will not contain words anymore. They will contain references to concepts defined in lexicons or, in a further step, ontologies. A first advance in this direction consists of employing WordNet as a reference lexicon in substituting word forms with word meanings into profiles. We show how the described content-based algorithms can be extended using a new, enriched document representation obtained by adding features generated using a new WordNet-based procedure.

The dissertation concludes with the description of the empirical study that evaluates the effectiveness of the proposed approach.

September 20, 2010, at 04:46 PM EST by 93.43.209.83 -
Changed line 25 from:

Recommender systems constitute one of the fastest growing segments of the Internet economy today. They help reduce information overload and provide customized information access for targeted domains. Such systems take input directly or indirectly from users and, based on their needs, preferences and ''usage patterns', provide personalized advices about products or services and can help people to filter useful information, thus giving users easing the information search and decision processes.

to:

Recommender systems constitute one of the fastest growing segments of the Internet economy today. They help reduce information overload and provide customized information access for targeted domains. Such systems take input directly or indirectly from users and, based on their needs, preferences and usage patterns, provide personalized advices about products or services and can help people to filter useful information, thus giving users easing the information search and decision processes.

September 20, 2010, at 04:45 PM EST by 93.43.209.83 -
Added lines 17-39:

Hybrid Recommendation Techniques based on User Profiles

Abstract

Nowadays users are overwhelmed by the abundant amount of information, and it is not just a problem to a minority of population; it is a problem for everyone in their daily life. In fact, now, we do not get information just from newspapers, colleagues, family members and friends, but also largely from the Internet.

How can people deal with this information overload problem? Individuals tend to filter and ignore information as the effective ways to cope with information overload.

Recommender systems constitute one of the fastest growing segments of the Internet economy today. They help reduce information overload and provide customized information access for targeted domains. Such systems take input directly or indirectly from users and, based on their needs, preferences and ''usage patterns', provide personalized advices about products or services and can help people to filter useful information, thus giving users easing the information search and decision processes.

Among different recommendation techniques proposed in the literature, the collaborative filtering approach is the most successful and widely adopted to date. Collaborative filtering by itself cannot always guarantee a good prediction. The effectiveness of predictions relies on the confidence of the computation of the similarity between users. Correlation between users can only be computed if they have rated a sufficient number of common items. Since users can choose among thousands of items to rate, especially in online catalogues, and new items become available continuously, it is likely that overlap of rated items between two users will be minimal in many cases. Therefore, many of the computed correlation coefficients are based on just few observations. As a result, correlation based only on co-rated items cannot be regarded as a reliable similarity measure.

One of the primary contributions of this thesis is the investigation on how the knowledge about users can be exploited to improve recommendations. In particular it is investigated how overlaps between users' interests could be used to define the similarity among users in order to improve recommendations.

The combination of classic collaborative filtering techniques and user profiles inferred using content-based methods for designing a new hybrid recommendation technique is presented.

In the study it is described the process of learning content-based profiles to be used in one of the main steps of the process for producing social recommendations: the neighborhood formation. A clustering technique for grouping user profiles is proposed in order to identify the set of neighbors for those users for which recommendations must be produced.

More specifically, the process of grouping user profiles learns two different profiles of the user: one, from positive examples of interesting items, represents the interests of the user, the other profile, learned from negative examples, represents items the user dislikes. The observation is that two users can be considered similar if they like the same items, but if they dislike the same ones as well.

Finally, advanced semantic user profiles based on concepts instead of keywords have been used for improving the accuracy of collaborative recommendations.

Several experiments have been carried out in order to evaluate the effectiveness of the approaches. Some baseline experiments on classic collaborative filtering have been carried out as benchmark. The final experimental analysis provides evidence of the improvements of the proposed approaches.

September 20, 2010, at 05:14 AM EST by 193.204.187.33 -
Deleted lines 19-23:



Luigi Iannone

July 29, 2010, at 10:31 AM EST by 193.204.187.33 -
Changed line 9 from:

Degemmis Marco

to:

Marco de Gemmis

July 29, 2010, at 05:13 AM EST by 193.204.187.33 -
Added line 11:
Deleted lines 13-14:



Added line 16:
Deleted lines 18-19:



Added line 21:
Deleted lines 23-24:



Added line 26:
Deleted lines 28-29:



Added line 31:
Deleted lines 33-34:



Added line 36:
Deleted lines 38-39:



Added line 41:
Deleted lines 43-44:



Added line 46:
Deleted lines 48-49:



Added line 50:
July 29, 2010, at 05:11 AM EST by 193.204.187.33 -
Deleted line 59:

\\\

July 29, 2010, at 05:11 AM EST by 193.204.187.33 -
Deleted line 61:
July 29, 2010, at 05:09 AM EST by 193.204.187.33 -
Added lines 1-139:

Ph.D. Graduates





Degemmis Marco






Paquale Lops






Oriana Licchelli






Luigi Iannone






Ignazio Palmisano






Domenico Redavid






Pierpaolo Basile






Eufemia Tinelli






Anna Lisa Gentile






Leo Iaquinta

Serendipity in Context: Context-aware Recommendations of Serendipitous Items

Abstract

When a person searches for a piece of information about a topic, she finds so much information available that she hardly unearths web pages, books, papers, articles, music, videos, etc. actually relevant to the searched topic. For instance, most search engines on the Internet return thousands of results on every query, while only a few of those results are really relevant for the searcher and they are not always at the top of the returned list. Furthermore, what is relevant and interesting for one searcher may not be relevant and interesting for another searcher, even if they submit the same query.

The extensive options lead the user to feel that she looses control on handling the amount of information and she becomes worried whether something interesting or important is being missed. This problem is often referred to as the information overload.

Recommender systems help to reduce information overload and provide customized information access for targeted domains. Such systems take direct or indirect input from users and, based on their needs, preferences and usage patterns, provide personalized advices about products or services so that users are assisted to filter useful information.

Recommender systems became an important research area since the appearance of the first papers on collaborative filtering since the mid-1990s. There has been much work done both in the industry and academia to develop and to improve new approaches to recommendations over the last decade. The interest in this area still remains high because it constitutes a problem-rich research area and because of the plenty of practical applications that help users to deal with the information overload and that provide them with personalized recommendations, content and services. In addition, despite all the advances, the current generation of recommender systems still requires further improvements to make recommendation methods more effective and applicable to an even broader range of real-life applications. These improvements include better methods for representing the user behavior and the information about the items to be recommended, more advanced recommendation modeling methods, and exploitation of contextual information into the recommendation process.

For some approaches, such as the content-based one, the item representation plays a key role, thus choosing proper facets to represent items is a fundamental task for deploying effective recommender systems. Contextual facets are often marginally relevant to learn and predict user preferences, but in some domains disregarding contextual facets makes recommendations useless. Consequently the thesis deals with the contextual dimension proposing a strategy to improve the effectiveness of a content-based recommender system by the exploitation of contextual facets. The demonstrative scenario concerns with the dynamic suggestion of personalized tours within a museum: the contextual facets deal with the physical layout of items and the interaction of users with the physical environment.

The thesis also deals with the serendipitous dimension. Indeed, recommender systems commonly recommend items that score highly against a user’s profile and, consequently, the user is recommended for items similar to those already rated. If this feature becomes a limitation, the recommender system suffers of over-specialization and it damages the common expectations concern with novelty and surprise. Indeed novelty occurs when the system suggests an unknown item that the user might have autonomously discovered. On the other hand, a serendipitous recommendation helps the user to find a surprisingly interesting item that she might not have otherwise discovered (or it would have been really hard to discover). Although the serendipity is a difficult concept to research because it is by definition not particularly susceptible to systematic control and prediction, the thesis deals with the serendipitous dimension, proposing a strategy to mitigate the over-specialization exploiting the learned user profiles.

Finally, the contextual dimension and serendipitous dimension are synergic. Indeed, the contextual dimension is used to refine the selection of supposed serendipitous items and to provide a practical interpretation of serendipity augmented recommendation task. On the other hand, the serendipity dimension allows to introduce an increased dynamicity in the contextual facets handling.




\\\