Course of
201121 – DATA BASES AND KNOWLEDGE BASES
A.Y. 2000/2001
Prof. Donato Malerba

Objectives. This course focuses on advanced concepts and methodologies for the development of data-oriented software systems, whose main issues are how to organize, manipulate, and access data, as well as how to extract information from row data. The course is organized in two parts: databases and architectures and paradigms for data analysis. The former focuses on the foundations of relational databases (models and languages), on the conceptual, logical and physical design of databases, as well as on some database evolutions (active databases, relational object-relational databases, multimedia databases and integration in the World Wide Web). The second part of the course covers concepts on data warehousing, OLAP, knowledge discovery process, and data mining methods (decision tree induction). Databases considered for the practice are Oracle 8i and Microsoft Access 97.

Prerequisites. Some familiarity with algorithms and data structures, databases and information systems.

Evaluation: written exam on relational database design and querying + homework assignment (course project) + oral exam on parts 5-12.

Program of the course

1. Introduction to database systems.

Organizational systems. Information systems for production. Components and evolution of information systems. Requirements of complex information systems. Databases and DBMS. Data models. Abstraction levels in a database. Database languages. DBMS interfaces. Database users. Database control: integrity, reliability, security. Classification of DBMS. Modules of a DBMS. Pros and cons in using DBMS. 2. Data models. Different aspects of data modeling: ontological, abstract linguistic, concrete linguistic, pragmatic. The object-oriented data model: representation of structures of concrete, abstract and procedural knowledge, representation of communication. Alternative data models: hierarchical, network, relational. 3. Relational databases: model and languages. Relations and tables. Relations with attributes. Relations and databases. Incomplete information and null values. Integrity constraints. Tuple constraints. Keys. Keys and null values. Referential constraints.
Relational algebra: primitive operators (union, difference, renaming, selection, projection, Cartesian join), derived operators (intersection, quotient, join, natural join, external join, semi-join), additional operators (complement, aggregation functions, transitive closure). Algebraic properties of relational operators.
Relational calculus: domain relational calculus, Tuple calculus with range declarations.
Algebra and calculus with null values
Views.
SQL. Historical evolution. Simple queries in SQL. Set queries. Group by queries. Data definition and administration in SQL: database creation, table creation, domain definition, inserting/deleting/modifying tuples, integrity constraints (intra-relational and inter-relational), modifying a schema, views, assertions, physical models (parameters and indices), relational catalogues, access control, tools for DB administration. Programming applications in SQL: host languages, languages with API interface, integrated languages (Oracle PL/SQL), transaction programming (explicit repetition of transactions, transactions with different levels of isolation).
QBE: a graphical query language.
4. Database design. Design techniques and models: the life cycle of information systems, a methodology for database design, the Entity-Relationship model (constructs and documentation of E-R schemas).
Conceptual design: requirements collection and analysis, general criteria for data representation, design strategies (top-down, bottom-up, inside-out, mixed), quality of a Conceptual schema, a comprehensive method for conceptual design.
Logical design: performance analysis on E-R schemas, restructuring of E-R schemas (analysis of redundancies, removing generalizations, partitioning and merging of entities and relationships, selection of primary identifiers), translation into the relational model (entities and many-to-many relationships, one-to-many relationships, entities with external identifiers, one-to-one relationships, graphical representation of translations, translation of a complex schema, summary tables).
Normalization: redundancies and anomalies, functional dependencies, Boyce - Codd normal form, decomposition properties (lossless decomposition, preservation of dependencies, qualities of decompositions), third normal form, database design and normalization (Verification of normalization on entities and relationships, violation of normal forms and restructuring of conceptual schemas).
5. Active databases. Databases and production systems. Trigger behaviour in a relational system. Definition and use of triggers in Oracle. Advanced features of active rules. Properties of active rules (termination, confluence e observational determinism). Design and implementation issues for active databases. The IDEA methodology. Applications of active databases. 6. Object-oriented and object-relational databases. Non-first normal form models. The relational object model. SQL-3: tuples and objects, type hierarchies, abstract data types, flattening and nesting queries. The third generation database manifesto. An object relational DBMS: Illustra. Objects in Oracle 8: abstract data types, collections, row objects, object views. 7. Multimedia databases. Multimedia information systems. Multimedia data. Multimedia data management. Recall and precision. Information retrieval systems for text. Manual and automatic indexing. Index creation. Functionalities and design of an IRS: the case of WebClass. 8. Databases and the World Wide Web. The Internet and the World Wide Web: nodes, addresses and protocols, the World Wide Web, anchors and URL, HTML.
Database access through the Web: atabase access through common gateway interface (CGI) programs, field-oriented interaction, advantages and disadvantages of DBMS Web, the universal server.
Web and databases: differences and similarities.
9. Microsoft Access’97. Main characteristics of the DBMS. Basic notions: relations, masks, queries, reports, macros and modules. Creating a new database. Data types. Relationships (one-to-one, one-to-many, many-to-many). Referential constraints. Primary keys and indices. Definition of a selection query (single and multiple tables), cross-fields queries, statement queries e parameterized queries. Operators and expressions in Access. Updating tables through queries. Queries with aggregation functions. Access’97 and the WWW: hyperlinks, data export in HTML format, use of Access’97 HTML models, data import from HTML tables, dynamic creation of Web pages through Internet Database Connector (IDC) 10. Oracle 8. Some tools of the DBMS Oracle 8. Creation of an Oracle database. Conceptual design with Oracle Designer. Generation of a logical model with Oracle Design Transformer. Generation of forms, queries and triggers with Oracle Developer. 11. Architectures and paradigms for data analysis. Operational vs. business data. Business Intelligence technologies. Decision support systems (DSS), Executive Information Systems (EIS) and Management Informaiton Systems (MIS). Main characteristics of a data warehouse. Data warehouse architectures. Data warehouse schemas: star schema and snowflake schema. OLAP and operations for data analysis: drill down and roll up. ROLAP and MOLAP. 12. Knowledge discovery in databases. Knowledge discovery in databases: definition and issues. The database knowledge discovery process: selection, preprocessing, transformation, data mining, interpretation and evaluation. Data mining: objectives, tasks, pattern representation, pattern evaluation, search methods. Decision tree induction.
Texts and Papers

P. Atzeni, S. Ceri, S. Paraboschi & R. Torlone
Database Systems - Concepts, Languages and Architectures
McGraw-Hill,2000.
Chapters: 1, 2, 3 (3.15 excluded), 4, 5, 6, 7, 8, 11.3-11.4, 12, 13, 14, Appendices A and D.

A. Albano, G. Ghelli, R. Orsini
Basi di dati relazionali e a oggetti
Zanichelli, 1997
Chapters: 1, 2, 5, 7, 8, 9, 10, 11.1, 13.3, 13.6, 15.4.1

U. Fayyad, G. Piatesky-Shapiro, P. Smyth. From data mining to knowledge discovery: an overview.
In U. Fayyad, G. Piatesky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining,
AAAI/MIT Press, pp. 1-35, 1996.

W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus. Knowledge discovery in databases.
In W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus, Knowledge discovery in databases,
AAAI/MIT Press, pp. 1-27, 1991.

T. Mitchell
Machine Learning
McGraw Hill, 1997.
Chapters 1 e 3.

Further references for homework assignments on database design:

L. Cabibbo, R. Torlone, C. Batini
Basi di dati: Progetti ed esercizi svolti.
Pitagora editrice, Bologna, 1995.

C. Francalanci, F. Schreiber, L. Tanca
Progetto di dati e funzioni
Società Editrice Esculapio, Bologna, 1995.

Texts, papers, and copies of transparencies are made available in the Library of the Department of Informatics.