Course of
A.Y. 1999/2000
Prof. Donato Malerba

Objectives. This course focuses on advanced concepts and methodologies for the development of data-oriented software systems, whose main issues are how to organize, manipulate, and access data, as well as how to extract information from row data. The course is organized in two parts: databases and architectures and paradigms for data analysis. The former focuses on the foundations of relational databases (models and languages), on the conceptual, logical and physical design of databases, as well as on some database evolutions (active databases, relational object-relational databases, object-oriented databases, multimedia databases, integration in the World Wide Web). The second part of the course covers concepts on data warehousing, OLAP, knowledge discovery process, and data mining methods (decision tree induction). Databases considered for the practice are Oracle 8i and Microsoft Access 97.

Prerequisites. Some familiarity with algorithms and data structures, databases and information systems.

Evaluation: written exam on relational database design and querying + homework assignment (course project) + oral exam on parts 5-12.

Program of the course A.Y. 1999-2000

1. Introduction to database systems.

Organizational systems. Information systems for production. Components and evolution of information systems. Requirements of complex information systems. Databases and DBMS. Data models. Abstraction levels in a database. Database languages. DBMS interfaces. Database users. Database control: integrity, reliability, security. Classification of DBMS. Modules of a DBMS. Pros and cons in using DBMS. 2. Data models. Different aspects of data modeling: ontological, abstract linguistic, concrete linguistic, pragmatic. The object-oriented data model: representation of structures of concrete, abstract and procedural knowledge, representation of communication. Alternative data models: hierarchical, network, relational. 3. The relational model. Relations and tables. Relations with attributes. Relations and databases. Incomplete information and null values. Integrity constraints. Tuple constraints. Keys. Keys and null values. Referential constraints. 3. Languages for relational databases. Relational algebra: primitive operators (union, difference, renaming, selection, projection, Cartesian join), derived operators (intersection, quotient, join, natural join, external join, semi-join), additional operators (complement, aggregation functions, transitive closure). Algebraic properties of relational operators.
Relational calculus: domain relational calculus, Tuple calculus with range declarations.
Algebra and calculus with null values
SQL. Historical evolution. Simple queries in SQL. Set queries. Group by queries. Data definition and administration in SQL: database creation, table creation, domain definition, inserting/deleting/modifying tuples, integrity constraints (intra-relational and inter-relational), modifying a schema, views, assertions, physical models (parameters and indices), relational catalogues, access control, tools for DB administration. Programming applications in SQL: host languages, languages with API interface, integrated languages (Oracle PL/SQL), transaction programming (explicit repetition of transactions, transactions with different levels of isolation).
QBE: a graphical query language.
4. Database design. Design techniques and models: the life cycle of information systems, a methodology for database design, the Entity-Relationship model (constructs and documentation of E-R schemas).
Conceptual design: requirements collection and analysis, general criteria for data representation, design strategies (top-down, bottom-up, inside-out, mixed), quality of a Conceptual schema, a comprehensive method for conceptual design.
Logical design: performance analysis on E-R schemas, restructuring of E-R schemas (analysis of redundancies, removing generalizations, partitioning and merging of entities and relationships, selection of primary identifiers), translation into the relational model (entities and many-to-many relationships, one-to-many relationships, entities with external identifiers, one-to-one relationships, graphical representation of translations, translation of a complex schema, summary tables).
Normalization: redundancies and anomalies, functional dependencies, Boyce - Codd normal form, decomposition properties (lossless decomposition, preservation of dependencies, qualities of decompositions), third normal form, database design and normalization (Verification of normalization on entities and relationships, violation of normal forms and restructuring of conceptual schemas).
5. Active databases. Databases and production systems. Trigger behaviour in a relational system. Definition and use of triggers in Oracle. Advanced features of active rules. Properties of active rules (termination, confluence e observational determinism). Design and implementation issues for active databases. The IDEA methodology. Applications of active databases. 6. Object-oriented and object-relational databases. Non-first normal form models. The relational object model. SQL-3: tuples and objects, type hierarchies, abstract data types, flattening and nesting queries. The third generation database manifesto. An object relational DBMS: Illustra. Objects in Oracle 8: abstract data types, collections, row objects, object views. Object-oriented databases: types, classes, methods, generalization hierarchies, persistence, redefinition of methods, refinement of properties and methods. The object-oriented database manifesto. The ODMG standard for object-oriented databases: ODL e OQL. An OODBMS: ObjectStore. 7. Object-oriented modeling in UML. A conceptual Model of the UML: Structural thing (class, interface, collaboration, use case, active class, component, node); Behavioral thing (interaction; state machine); Grouping thing (package); Annotational thing (note); Relationship: dependence; association, generalization, realization; Diagram:
class diagram, object diagram, use case diagram, sequence diagram, collaboration diagram, statechart diagram, activity diagram, component diagram, deployment diagram. Common mechanisms in UML: specifications, ornaments, extensions (stereotypes, labelled values and constraints). The five views: use case view, design view, implementation view, process view, deployment view. The Eastern State University: a case study.
8. Multimedia databases. Multimedia information systems. Multimedia data. Multimedia data management. Recall and precision. Information retrieval systems for text. Manual and automatic indexing. Index creation. Functionalities and design of an IRS: the case of WebClass. 9. Databases and the World Wide Web. The Internet and the World Wide Web: nodes, addresses and protocols, the World Wide Web, anchors and URL, HTML.
Database access through the Web: atabase access through common gateway interface (CGI) programs, field-oriented interaction, advantages and disadvantages of DBMS Web, the universal server.
Web and databases: differences and similarities.
Client-server architecture. Three-tier architecture. JDBC. Servlets. Java Server Pages. Enterprise Java Beans.
10. Microsoft Access’97. Main characteristics of the DBMS. Basic notions: relations, masks, queries, reports, macros and modules. Creating a new database. Data types. Relationships (one-to-one, one-to-many, many-to-many). Referential constraints. Primary keys and indices. Definition of a selection query (single and multiple tables), cross-fields queries, statement queries e parameterized queries. Operators and expressions in Access. Updating tables through queries. Queries with aggregation functions. Access’97 and the WWW: hyperlinks, data export in HTML format, use of Access’97 HTML models, data import from HTML tables, dynamic creation of Web pages through Internet Database Connector (IDC). Creation of Active Server Pages (ASP). 11. Architectures and paradigms for data analysis. Operational vs. business data. Business Intelligence technologies. Decision support systems (DSS), Executive Information Systems (EIS) and Management Informaiton Systems (MIS). Main characteristics of a data warehouse. Data warehouse architectures. Data warehouse schemas: star schema and snowflake schema. OLAP and operations for data analysis: drill down and roll up. ROLAP and MOLAP. 12. Knowledge discovery in databases. Knowledge discovery in databases: definition and issues. The database knowledge discovery process: selection, preprocessing, transformation, data mining, interpretation and evaluation. Data mining: objectives, tasks, pattern representation, pattern evaluation, search methods. Decision tree induction: the three fundamental problems (associating classes to leaves, choosing the test, stopping criteria). Information gain and gain-ratio. Decision tree pruning methods. From decision trees to production rules.
Texts and Papers

P. Atzeni, S. Ceri, S. Paraboschi & R. Torlone
Database Systems - Concepts, Languages and Architectures
Chapters: 1, 2, 3 (3.15 excluded), 4, 5, 6, 7, 8, 11, 12, 13, 14, Appendices A and D.

A. Albano, G. Ghelli, R. Orsini
Basi di dati relazionali e a oggetti
Zanichelli, 1997
Chapters: 1, 2, 5, 7, 8, 9, 10, 11.1, 13.3, 13.6, 15.4.1

G. Lausen, G. Vossen
Models and Languages of Object-Oriented Databases
Addison-Wesley, International Computer Science Series, 1997
Chapters 1, 2, 3.

G. Booch, I. Jacobson, J. Rumbaugh
The Unified Modeling Language: User Guide
Addison-Wesley, 1999
Chapters 1, 2, 3

U. Fayyad, G. Piatesky-Shapiro, P. Smyth. From data mining to knowledge discovery: an overview.
In U. Fayyad, G. Piatesky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining,
AAAI/MIT Press, pp. 1-35, 1996.

W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus. Knowledge discovery in databases.
In W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus, Knowledge discovery in databases,
AAAI/MIT Press, pp. 1-27, 1991.

T. Mitchell
Machine Learning
McGraw Hill, 1997.
Chapters 1 e 3.

Further references for homeworks on database design:

L. Cabibbo, R. Torlone, C. Batini
Basi di dati: Progetti ed esercizi svolti.
Pitagora editrice, Bologna, 1995.

C. Francalanci, F. Schreiber, L. Tanca
Progetto di dati e funzioni
Società Editrice Esculapio, Bologna, 1995.

Texts, papers, and copies of transparencies are made available in the Library of the Department of Informatics.