Discretization of Continuous-Valued Data

Discretization of Continuous-Valued Data
in Symbolic Classification Learning

F. Esposito D. Malerba G. Semeraro S. Caggese

Dipartimento di Informatica, Università degli Studi di Bari
via Orabona 4, 70126 Bari, Italy
{esposito | malerba | semeraro | caggese}@lacam.uniba.it

Summary: Handling both nominal (categorical) and continuous (numerical) variables is a central issue for practical applications of classification learning. In the most part of symbolic machine learning systems, continuous-valued attributes are discretized prior to the attributes selection process, partitioning the range of attributes into subranges. The paper addresses the problem of inducing classification rules from both numeric and symbolic data, discretizing the continuous valued attributes during the learning process: a specialization operator is proposed for this purpose and a heuristic function is used in order to improve the operator efficiency. The operator has been embedded into a classification learning system, INDUBI/CSL, and tested on several data sets.

Key words: Symbolic Data Analysis, Discrimination and Classification, Pattern Recognition