AI*IA'99

Tutorial

Bologna, 14 September 1999


	Knowledge modelling: Foundations and Applications Enrico Motta, Open University - UK
	Metodologies and Applications of Data Mining Donato Malerba, Università di Bari e Lorenza Saitta, Università del Piemonte Orientale "Amedeo Avogadro"
	Information Extraction from Text: an Introduction Roberto Basili, Università Tor Vergata e Fabio Ciravegna, ITC-IRST Trento
	Constraint Programming approaches to AI applications Michela Milano, Università di Bologna
	Artificial Vision and Applications Antonio Chella, Università di Palermo

Abstracts and Speakers Curriculum Vitae

Knowledge modelling: Foundations and Applications

Enrico Motta (Open University - UK)

Abstract

This tutorial will focus on knowledge modelling. I will outline the history of the knowledge modelling paradigm, emphasise its organic synergy with research in knowledge sharing and reuse and discuss the state-of-the-art in knowledge modelling technology. I will also highlight the importance of knowledge modelling technology for a number of areas, including knowledge-based system specification, knowledge acquisition, enterprise modelling, information retrieval and knowledge management. In particular I will present in some detail the two main technologies which have been developed in the knowledge modelling area: ontologies and problem solving methods. I will describe the role of these technologies, the relevant modelling languages, support tools and applications. During the tutorial I also aim to show live demos of web-based tools for knowledge modelling. Finally, I will discuss real-world applications of knowledge modelling technology in the domains of engineering design, news publishing and knowledge management on the World-Wide-Web. The tutorial is aimed to a generically computer-literate audience. However, basic knowledge of AI and knowledge representation will be an advantage.

Metodologies and Applications of Data Mining

Lorenza Saitta (Università del Piemonte Orientale "Amedeo Avogadro")

Donato Malerba (Università di Bari)

Abstract

Knowldge Discovery in Database is "...the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data". Data Mining is currently understood to be a part of the whole KDD process, namely the application of specific algorithm(s) to pre-processed data. The KDD process involves a variety of tasks:

Classification
Segmentation
Caratterization
Discovery of functional dependencies or associations
Outliers identification
Temporal series analysis.

Several disciplines contribute to KDD, notably:

Statistics
Pattern Recognition
Artificial Intelligence (Machine Learning, Intelligent Agents)
Data Bases ("Query and reporting", "Data warehousing")
Visualization (Graphics, Multimedia, ...)

Hystorically, approaches to DM can be grouped into two broad classes: "Verification-driven" and "Discovery-driven".

"Verification-Driven" Approaches

A "Verification-driven" approach heavily relies on the experience of a human expert, who formulates a hypothesis, to be verified in the available data. Typical verification-drive metodologies are:

"Query and Reporting"
Electronic sheet
Multidimensional Analysis
Statistical Analyses
- Experiment design and Sampling
- Graphical Statistics
- Dimensionality reduction
- Hypothesis testing and Model ranking
- Classification and Regression
- "Esploratory Data Analysis" and Clustering

"Discovery-Driven" Approaches

In "Discovery-driven" approaches, hypotheses are formulated bottom-up, starting from the data. Interesting patterns in the data are noticed, and, then evaluated by the user/expert. Most discovery-driven approaches have their roots in Artificial Intelligence, and, specifically, in the fields:

Symbolic Machine Learning:
- Decision trees
- Rules
- Conceptual clustering
Neural Networks
Genetic Algorithms
Bayesian Networks

Special attention is given, in DM, to the integration of several methodologies, for instance, symbolic machine learning and statistics.

Evolution of Data Memorization

Data analysis techniques evolved hand in hand with data storage techniques. In the past, it was strongly recommended to adopt a centralized model, which supports an integrated, non-redundant storage of data of interest and facilitates data analysis for decision support. Nevertheless, the dynamism of business realities and the difficulties in coordinating different sectors led organizations to equip themselves of several distributed, heterogeneous and autonomous data bases. In this context, the answer to the requirement of a support for executive decision making has been the datawarehouse, that is an integrated data base of historical, non-volatile data. While the design of a datawarehouse includes the activities of selection and transformation of data available on source databases, as well as the definition of relative metadata, the feeding of a datawarehouse requires data extraction and cleaning. This strong analogy with preliminary steps of the KDD process motivates the synergy between the areas of data analysis and information systems.

The alternative to the costly design and implementation of a datawarehouse is the interconnection of a set of "Data-marts", that is less expensive thematic datawarehouses built for individual departments and strategic business units. A recent trend in the area of information systems is the transformation of Data Base Management systems into Distributed Knowledge Management systems. Thus, data mining is becoming an important component of business intelligence, which denotes the set of processes, techniques and tools that support decision making through the application of modern information technologies.

Data Mining Applications

Although Data Mining might be virtually applied to all domains, promising results have already been reported in some specific applications, namely:

Industrial applications to the banking and financial domains (fraud prevention, risk assessment, profit analysis), to the telecommunication domain (fraud detection, customer churn), and to the retail domain (customer segmentation, analysis of customer satisfaction).
Web Mining applications, such as discovery of resources (documents and services) in the Web, information extraction from discovered resources, and automated acquisition of web browsing skills.
Applications to spatial data analysis (Spatial Data Mining), where spatial data refer to objects that carries both spatial and descriptive properties. Geographic Information Systems and CAD systems are the main applications that make use of spatial data.
Applications to the analysis of wide collections of textual documents (Text Mining) available in information retrieval systems. Problems raised in these applications partially overlaps with those of Web Mining.

Conclusions

The most important novelty in KDD and DM is the tendency to use multi-strategic approaches and the need to apply them to really large and complex real-world applications. Still, several problems remain to be solved. At the methodological level, the most relevant open problems are the following ones:

Handling relational data, which need a first order logic language to be described.
Use of previous domain knowledge.
Effective interation with the user and pattern comprehensibility.
Feature construction.
Handling dynamic data.
Missing data, Noise, and Overfitting.
True scalability

Also at the systemic level there are still problems only partially solved, for instance:

Flexible integration of different approaches
Multimedial data integration
Integration with sensors and actuators
Implementation on Intra (Extra, Inter)-Net

Donato Malerba is an associate professor at the University of Bari in the Department of Informatics, where he teaches in the courses of "Data Bases and Knowledge Bases" and "Computer Programming II". For the past decade, he has been active in machine learning and its applications to intelligent document processing, knowledge discovery in databases, map interpretation, and intelligent interfaces. He has published several papers in refereed conferences and journals. He received the best paper award for a paper presented at the Symposium on "Knowledge Discovery in Databases" - 13th European Meeting on Cybernetics and Systems Research. He has served in the program committee of the International Conference on Machine Learning (ICML'96, ICML'99), of the AI*IA workshop on Machine Learning and Natural Language Processing (Turin, December 1997), and of the ICML'99 Workshop on "Text Mining: Foundations, Techniques and Applications". He is currently involved in the ESPRIT project SODAS 20821 (Symbolic Official Data Analysis System).

Lorenza Saitta is Full Professor of Computer Science at the University of Piemonte Orientale "Amedeo Avogadro" (Italy). She started her research activity in Pattern Recognition, moving soon to AI, specifically in the area of Fuzzy Logic for Expert Systems. In 1984 she started working in Machine Learning, initiating thus the research in the field in Italy. Her interests moved from inductive symbolic approaches (which produced the systems ML-SMART and RIGEL for learning first-order logic decision rules, which have been applied to real world problems) towards integrated learning strategies, based on more complex reasoning schemes involving also deductive and abductive methodologies (system WHY) and in the definition and use of abstraction mechanisms for knowledge representation. More recently, she become also interested in Genetic Algorithms (systems REGAL) and in links with Cognitive Sciences.

She authored (or edited) four books and more than 130 papers in journals, books and international conferences. She is (or has been) a member of various journals' Editorial Board and of many international conferences Program Committees, notably, the International Machine Learning Conference '92, '93, '94, '97, '98, '99, the European Machine Learning Conf. '91, '93, '94, '96, ECAI-92, IJCAI-97 (a responsible for the Machine Learning area).

She is an Action Editor for the Machine Learning Journal, the Responsible of the Research Technical Committee of the CEE Network Excellence for Machine Learning (MLNet II), and has been a Co-Director of the European Science Foundation project on "Learning in Human and Machine". She has been responsible of or participated to several European Research Projects.

She gave an Invited Survey on Machine Learning at ECAI-92, and has been Invited Speaker to the Int. Joint Conf. on Artificial Intelligence (IJCAI-93), the European Conf. on Machine Learning (ECML-94), the Int. Workshop in Inductive Logic Programming (ILP-94), the Int. Wokhop on Artificial Intelligence and Cognitive Science (1994), and the Multistrategy Learning Workshop (1996).

She has been the Chairperson of the Int. Conference on Machine Learning in 1996.

Moreover, she has been Co-Chairperson of the IPMU Conference (1988), of the ISMIS Conference (1988), and of the 4th Int. Workshop on Multistrategy Learning (1998).

Introduzione all'Estrazione di Informazioni da Testi

Roberto Basili (Università Tor Vergata - Roma)

Fabio Ciravegna (ITC-IRST - Trento)

Abstract

A system for information extraction (IE) from text automatically extracts a set of predefined information from real-word texts. This information is generally summarized in a tabular format (i.e. in a user-defined template). The user can then be presented with such a summary, or a database can be populated with the information contained. From an historical point of view the first relevant activities in IE have been carried out in the US, where comparative evaluation (Message Understanding Conferences) took place since the end of the 80s. In the last years IE applications have been defined in fields ranging from finance to medicine to industrial diagnosis. Aim of this tutorial is to provide the participants with:

an introduction to IE,
a description of some relevant industrial experiences,
an overview of both scientific and applicative perspectives for IE.

Roberto Basili is researcher at Università di Roma Tor Vergata, Natural Language Processing Group. He participated in a number of projects in the field of IE. Among them ECRAN, a project funded by the European Union for the definition of a new generation of IE systems. His work in ECRAN mainly concerned models for lexical acquisition.

Fabio Ciravegna is responsible for the IE project at ITC-irst, Trento. He has been ITC-irst manager of FACILE, a project funded by the European Union for text classification and IE. Within FACILE he was coordinator of the IE activity for the whole project. FACILE has been selected by the EU as one of the most successful EU-funded projects in the area of Language Engineering for 1998 (IST98 conference, Vienna). Fabio Ciravegna's work mainly concentrate on architectures and applications for IE, and parsing technologies for text analysis. From 1988 to 1993 he was researcher at Centro Ricerche Fiat where he coordinated the IE project.

Constraint Programming approaches to AI applications

Michela Milano (Università di Bologna)

Slides (.pdf version)

Abstract

Many problems in the field of Artificial Intelligence can be modelled and solved as Constraint Satisfaction Problems. A programming paradigm suitable for modelling and solving such problems is Constraint Programming (CP). CP has been widely used in the last years for solving high dimension real life problems thanks to its modelling flexibility and efficient propagation algorithms allowing to prune the search space.

The goal of the tutorial will be to present some preliminaries on Constraint Programming and to show how some combinatorial optimization problems can be modelled and solved thanks to CP. Some problems will be considered such as scheduling, planning, timetabling and routing. Some commercial CP systems will be described with special attention to search strategies and propagation algorithms used.

Artificial Vision and Applications

Antonio Chella (Università di Palermo)

Abstract

The main goal of the tutorial is to situate computer vision in the field of artificial intelligence research. The main topics of the tutorial are:

the review of the main computer vision architectures, starting from the architecture proposed by Marr with its related developments, to the several architectures of active vision;
the analysis of the problems related to the symbolic interpretation of images and the most recent systems proposed in the literature;
the presentation of some effective industrial applications of computer vision;
an overview of the current reseaches on computer vision system for autonomous robotics.

Antonio Chella was born in Florence on March 4, 1961. He received his laurea degree in Electronic Engineering in 1988 and his Ph.D. in Computer Science in 1993 from the University of Palermo, Italy. Currently, he is an associate professor of Robotics at the Department of Electrical Engineering of the University of Palermo. His research interests are in the field of autonomous robotics, artificial vision, neural networks, hybrid (symbolic/subsymbolic) systems and knowledge representation.

AI*IA'99

Tutorial

Bologna, 14 September 1999

Knowledge modelling: Foundations and Applications

Enrico Motta, Open University - UK

Metodologies and Applications of Data Mining

Donato Malerba, Università di Bari e Lorenza Saitta, Università del Piemonte Orientale "Amedeo Avogadro"

Information Extraction from Text: an Introduction

Roberto Basili, Università Tor Vergata e Fabio Ciravegna, ITC-IRST Trento

Constraint Programming approaches to AI applications

Michela Milano, Università di Bologna

Artificial Vision and Applications

Antonio Chella, Università di Palermo

Knowledge modelling: Foundations and Applications

Enrico Motta (Open University - UK)

Metodologies and Applications of Data Mining

Lorenza Saitta (Università del Piemonte Orientale "Amedeo Avogadro")

Donato Malerba (Università di Bari)

Introduzione all'Estrazione di Informazioni da Testi

Roberto Basili (Università Tor Vergata - Roma)

Fabio Ciravegna (ITC-IRST - Trento)

Constraint Programming approaches to AI applications

Michela Milano (Università di Bologna)

Artificial Vision and Applications

Antonio Chella (Università di Palermo)