201121 – DATA BASES AND KNOWLEDGE BASES
Prof. Donato Malerba
course focuses on advanced concepts and methodologies for the development
of data-oriented software systems, whose main issues are how to organize,
manipulate, and access data, as well as how to extract information from
row data. The course is organized in two parts: databases and architectures
and paradigms for data analysis. The former focuses on the foundations
of relational databases (models and languages), on the conceptual, logical
and physical design of databases, as well as on some database evolutions
(active databases, relational object-relational databases, multimedia databases
and integration in the World Wide Web). The
second part of the course covers concepts on data warehousing, OLAP, knowledge
discovery process, and data mining methods (decision tree induction). Databases
considered for the practice are Oracle 8i and Microsoft Access 97.
Prerequisites. Some familiarity with algorithms
and data structures, databases and information systems.
Evaluation: written exam on relational database
design and querying + homework assignment (course project) + oral exam on parts 5-12.
Program of the course
1. Introduction to database systems.
Organizational systems. Information systems for production.
Components and evolution of information systems. Requirements of complex
information systems. Databases and DBMS. Data models. Abstraction levels
in a database. Database languages. DBMS interfaces. Database users. Database
control: integrity, reliability, security. Classification of DBMS. Modules
of a DBMS. Pros and cons in using DBMS.
2. Data models.
Different aspects of data modeling: ontological, abstract
linguistic, concrete linguistic, pragmatic. The object-oriented data model:
representation of structures of concrete, abstract and procedural knowledge,
representation of communication. Alternative data models: hierarchical,
3. Relational databases: model and languages.
Relations and tables. Relations with attributes. Relations
and databases. Incomplete information and null values. Integrity constraints.
Tuple constraints. Keys. Keys and null values. Referential constraints.
Relational algebra: primitive operators (union, difference,
renaming, selection, projection, Cartesian join), derived operators (intersection,
quotient, join, natural join, external join, semi-join), additional operators
(complement, aggregation functions, transitive closure). Algebraic properties
of relational operators.
Relational calculus: domain relational calculus, Tuple
calculus with range declarations.
Algebra and calculus with null values
SQL. Historical evolution. Simple queries in SQL. Set
queries. Group by queries. Data definition and administration in SQL: database
creation, table creation, domain definition, inserting/deleting/modifying
tuples, integrity constraints (intra-relational and inter-relational),
modifying a schema, views, assertions, physical models (parameters and
indices), relational catalogues, access control, tools for DB administration.
Programming applications in SQL: host languages, languages with API interface,
integrated languages (Oracle PL/SQL), transaction programming (explicit
repetition of transactions, transactions with different levels of isolation).
QBE: a graphical query language.
4. Database design.
Design techniques and models: the life cycle of information
systems, a methodology for database design, the Entity-Relationship model
(constructs and documentation of E-R schemas).
Conceptual design: requirements collection and analysis,
general criteria for data representation, design strategies (top-down,
bottom-up, inside-out, mixed), quality of a Conceptual schema, a comprehensive
method for conceptual design.
Logical design: performance analysis on E-R schemas,
restructuring of E-R schemas (analysis of redundancies, removing generalizations,
partitioning and merging of entities and relationships, selection of primary
identifiers), translation into the relational model (entities and many-to-many
relationships, one-to-many relationships, entities with external identifiers,
one-to-one relationships, graphical representation of translations, translation
of a complex schema, summary tables).
Normalization: redundancies and anomalies, functional
dependencies, Boyce - Codd normal form, decomposition properties (lossless
decomposition, preservation of dependencies, qualities of decompositions),
third normal form, database design and normalization (Verification of normalization
on entities and relationships, violation of normal forms and restructuring
of conceptual schemas).
5. Active databases.
Databases and production systems. Trigger behaviour
in a relational system. Definition and use of triggers in Oracle. Advanced
features of active rules. Properties of active rules (termination, confluence
e observational determinism). Design and implementation issues for active
databases. The IDEA methodology. Applications of active databases.
6. Object-oriented and object-relational databases.
Non-first normal form models. The relational object
model. SQL-3: tuples and objects, type hierarchies, abstract data types,
flattening and nesting queries. The third generation database manifesto.
An object relational DBMS: Illustra. Objects in Oracle 8: abstract data
types, collections, row objects, object views.
7. Multimedia databases.
Multimedia information systems. Multimedia data. Multimedia
data management. Recall and precision. Information retrieval systems for
text. Manual and automatic indexing. Index creation. Functionalities and
design of an IRS: the case of WebClass.
8. Databases and the World Wide Web.
The Internet and the World Wide Web: nodes, addresses
and protocols, the World Wide Web, anchors and URL, HTML.
Database access through the Web: atabase access through
common gateway interface (CGI) programs, field-oriented interaction, advantages
and disadvantages of DBMS Web, the universal server.
Web and databases: differences and similarities.
9. Microsoft Access’97.
Main characteristics of the DBMS. Basic notions: relations,
masks, queries, reports, macros and modules. Creating a new database. Data
types. Relationships (one-to-one, one-to-many, many-to-many). Referential
constraints. Primary keys and indices. Definition of a selection query
(single and multiple tables), cross-fields queries, statement queries e
parameterized queries. Operators and expressions in Access. Updating tables
through queries. Queries with aggregation functions. Access’97 and the
WWW: hyperlinks, data export in HTML format, use of Access’97 HTML models,
data import from HTML tables, dynamic creation of Web pages through Internet
Database Connector (IDC)
10. Oracle 8.
Some tools of the DBMS Oracle 8. Creation of an Oracle database. Conceptual design with Oracle Designer. Generation of a logical model with Oracle Design Transformer. Generation of forms, queries and triggers with Oracle Developer.
11. Architectures and paradigms for data analysis.
Operational vs. business data. Business Intelligence
technologies. Decision support systems (DSS), Executive Information Systems
(EIS) and Management Informaiton Systems (MIS). Main characteristics of
a data warehouse. Data warehouse architectures. Data warehouse schemas:
star schema and snowflake schema. OLAP and operations for data analysis:
drill down and roll up. ROLAP and MOLAP.
12. Knowledge discovery in databases.
Knowledge discovery in databases: definition and issues.
The database knowledge discovery process: selection, preprocessing, transformation,
data mining, interpretation and evaluation. Data mining: objectives, tasks,
pattern representation, pattern evaluation, search methods. Decision tree
Texts and Papers
P. Atzeni, S. Ceri, S. Paraboschi & R. Torlone
Database Systems - Concepts, Languages and Architectures
Chapters: 1, 2, 3 (3.15 excluded), 4, 5, 6, 7, 8, 11.3-11.4,
12, 13, 14, Appendices A and D.
A. Albano, G. Ghelli, R. Orsini
Basi di dati relazionali e a oggetti
Chapters: 1, 2, 5, 7, 8, 9, 10, 11.1, 13.3, 13.6, 15.4.1
U. Fayyad, G. Piatesky-Shapiro, P. Smyth. From data mining
to knowledge discovery: an overview.
In U. Fayyad, G. Piatesky-Shapiro, P. Smyth, R. Uthurusamy
(Eds.), Advances in Knowledge Discovery and Data Mining,
AAAI/MIT Press, pp. 1-35, 1996.
W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus. Knowledge
discovery in databases.
In W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus, Knowledge
discovery in databases,
AAAI/MIT Press, pp. 1-27, 1991.
McGraw Hill, 1997.
Chapters 1 e 3.
Further references for homework assignments on database design:
L. Cabibbo, R. Torlone, C. Batini
Basi di dati: Progetti ed esercizi svolti.
Pitagora editrice, Bologna, 1995.
C. Francalanci, F. Schreiber, L. Tanca
Progetto di dati e funzioni
Società Editrice Esculapio, Bologna, 1995.
Texts, papers, and copies of transparencies are made available
in the Library of the Department of Informatics.