Enrico Motta, Open University - UK
Donato Malerba, Università di Bari e Lorenza Saitta, Università del Piemonte Orientale "Amedeo Avogadro"
Roberto Basili, Università Tor Vergata e Fabio Ciravegna, ITC-IRST Trento
Michela Milano, Università di Bologna
Antonio Chella, Università di Palermo
|Abstracts and Speakers Curriculum Vitae|
This tutorial will focus on knowledge modelling. I will outline the history of the knowledge modelling paradigm, emphasise its organic synergy with research in knowledge sharing and reuse and discuss the state-of-the-art in knowledge modelling technology. I will also highlight the importance of knowledge modelling technology for a number of areas, including knowledge-based system specification, knowledge acquisition, enterprise modelling, information retrieval and knowledge management. In particular I will present in some detail the two main technologies which have been developed in the knowledge modelling area: ontologies and problem solving methods. I will describe the role of these technologies, the relevant modelling languages, support tools and applications. During the tutorial I also aim to show live demos of web-based tools for knowledge modelling. Finally, I will discuss real-world applications of knowledge modelling technology in the domains of engineering design, news publishing and knowledge management on the World-Wide-Web. The tutorial is aimed to a generically computer-literate audience. However, basic knowledge of AI and knowledge representation will be an advantage.
Knowldge Discovery in Database is "...the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data". Data Mining is currently understood to be a part of the whole KDD process, namely the application of specific algorithm(s) to pre-processed data. The KDD process involves a variety of tasks:
Several disciplines contribute to KDD, notably:
Hystorically, approaches to DM can be grouped into two broad classes: "Verification-driven" and "Discovery-driven".
A "Verification-driven" approach heavily relies on the experience of a human expert, who formulates a hypothesis, to be verified in the available data. Typical verification-drive metodologies are:
In "Discovery-driven" approaches, hypotheses are formulated bottom-up, starting from the data. Interesting patterns in the data are noticed, and, then evaluated by the user/expert. Most discovery-driven approaches have their roots in Artificial Intelligence, and, specifically, in the fields:
Special attention is given, in DM, to the integration of several methodologies, for instance, symbolic machine learning and statistics.
Evolution of Data Memorization
Data analysis techniques evolved hand in hand with data storage techniques. In the past, it was strongly recommended to adopt a centralized model, which supports an integrated, non-redundant storage of data of interest and facilitates data analysis for decision support. Nevertheless, the dynamism of business realities and the difficulties in coordinating different sectors led organizations to equip themselves of several distributed, heterogeneous and autonomous data bases. In this context, the answer to the requirement of a support for executive decision making has been the datawarehouse, that is an integrated data base of historical, non-volatile data. While the design of a datawarehouse includes the activities of selection and transformation of data available on source databases, as well as the definition of relative metadata, the feeding of a datawarehouse requires data extraction and cleaning. This strong analogy with preliminary steps of the KDD process motivates the synergy between the areas of data analysis and information systems.
The alternative to the costly design and implementation of a datawarehouse is the interconnection of a set of "Data-marts", that is less expensive thematic datawarehouses built for individual departments and strategic business units. A recent trend in the area of information systems is the transformation of Data Base Management systems into Distributed Knowledge Management systems. Thus, data mining is becoming an important component of business intelligence, which denotes the set of processes, techniques and tools that support decision making through the application of modern information technologies.
Data Mining Applications
Although Data Mining might be virtually applied to all domains, promising results have already been reported in some specific applications, namely:
The most important novelty in KDD and DM is the tendency to use multi-strategic approaches and the need to apply them to really large and complex real-world applications. Still, several problems remain to be solved. At the methodological level, the most relevant open problems are the following ones:
Donato Malerba is an associate professor at the University of Bari in the Department of Informatics, where he teaches in the courses of "Data Bases and Knowledge Bases" and "Computer Programming II". For the past decade, he has been active in machine learning and its applications to intelligent document processing, knowledge discovery in databases, map interpretation, and intelligent interfaces. He has published several papers in refereed conferences and journals. He received the best paper award for a paper presented at the Symposium on "Knowledge Discovery in Databases" - 13th European Meeting on Cybernetics and Systems Research. He has served in the program committee of the International Conference on Machine Learning (ICML'96, ICML'99), of the AI*IA workshop on Machine Learning and Natural Language Processing (Turin, December 1997), and of the ICML'99 Workshop on "Text Mining: Foundations, Techniques and Applications". He is currently involved in the ESPRIT project SODAS 20821 (Symbolic Official Data Analysis System).
Lorenza Saitta is Full Professor of Computer Science at the University of Piemonte Orientale "Amedeo Avogadro" (Italy). She started her research activity in Pattern Recognition, moving soon to AI, specifically in the area of Fuzzy Logic for Expert Systems. In 1984 she started working in Machine Learning, initiating thus the research in the field in Italy. Her interests moved from inductive symbolic approaches (which produced the systems ML-SMART and RIGEL for learning first-order logic decision rules, which have been applied to real world problems) towards integrated learning strategies, based on more complex reasoning schemes involving also deductive and abductive methodologies (system WHY) and in the definition and use of abstraction mechanisms for knowledge representation. More recently, she become also interested in Genetic Algorithms (systems REGAL) and in links with Cognitive Sciences.
She authored (or edited) four books and more than 130 papers in journals, books and international conferences. She is (or has been) a member of various journals' Editorial Board and of many international conferences Program Committees, notably, the International Machine Learning Conference '92, '93, '94, '97, '98, '99, the European Machine Learning Conf. '91, '93, '94, '96, ECAI-92, IJCAI-97 (a responsible for the Machine Learning area).
She is an Action Editor for the Machine Learning Journal, the Responsible of the Research Technical Committee of the CEE Network Excellence for Machine Learning (MLNet II), and has been a Co-Director of the European Science Foundation project on "Learning in Human and Machine". She has been responsible of or participated to several European Research Projects.
She gave an Invited Survey on Machine Learning at ECAI-92, and has been Invited Speaker to the Int. Joint Conf. on Artificial Intelligence (IJCAI-93), the European Conf. on Machine Learning (ECML-94), the Int. Workshop in Inductive Logic Programming (ILP-94), the Int. Wokhop on Artificial Intelligence and Cognitive Science (1994), and the Multistrategy Learning Workshop (1996).
She has been the Chairperson of the Int. Conference on Machine Learning in 1996.
Moreover, she has been Co-Chairperson of the IPMU Conference (1988), of the ISMIS Conference (1988), and of the 4th Int. Workshop on Multistrategy Learning (1998).
A system for information extraction (IE) from text automatically extracts a set of predefined information from real-word texts. This information is generally summarized in a tabular format (i.e. in a user-defined template). The user can then be presented with such a summary, or a database can be populated with the information contained. From an historical point of view the first relevant activities in IE have been carried out in the US, where comparative evaluation (Message Understanding Conferences) took place since the end of the 80s. In the last years IE applications have been defined in fields ranging from finance to medicine to industrial diagnosis. Aim of this tutorial is to provide the participants with:
Roberto Basili is researcher at Università di Roma Tor Vergata, Natural Language Processing Group. He participated in a number of projects in the field of IE. Among them ECRAN, a project funded by the European Union for the definition of a new generation of IE systems. His work in ECRAN mainly concerned models for lexical acquisition.
is responsible for the IE project at ITC-irst, Trento.
He has been ITC-irst manager of FACILE, a project funded by the
European Union for text classification and IE. Within FACILE he was
coordinator of the IE activity for the whole project. FACILE has been
selected by the EU as one of the most successful EU-funded projects in
the area of Language Engineering for 1998 (IST98 conference, Vienna).
Fabio Ciravegna's work mainly concentrate on architectures and
applications for IE, and parsing technologies for text analysis. From
1988 to 1993 he was researcher at Centro Ricerche Fiat where he
coordinated the IE project.
Slides (.pdf version)
Many problems in the field of Artificial Intelligence can be modelled and solved as Constraint Satisfaction Problems. A programming paradigm suitable for modelling and solving such problems is Constraint Programming (CP). CP has been widely used in the last years for solving high dimension real life problems thanks to its modelling flexibility and efficient propagation algorithms allowing to prune the search space.
The goal of the tutorial will be to present some preliminaries on Constraint Programming and to show how some combinatorial optimization problems can be modelled and solved thanks to CP. Some problems will be considered such as scheduling, planning, timetabling and routing. Some commercial CP systems will be described with special attention to search strategies and propagation algorithms used.
The main goal of the tutorial is to situate computer vision in the field of artificial intelligence research. The main topics of the tutorial are:
Antonio Chella was born in Florence on March 4, 1961. He received his laurea degree in Electronic Engineering in 1988 and his Ph.D. in Computer Science in 1993 from the University of Palermo, Italy. Currently, he is an associate professor of Robotics at the Department of Electrical Engineering of the University of Palermo. His research interests are in the field of autonomous robotics, artificial vision, neural networks, hybrid (symbolic/subsymbolic) systems and knowledge representation.