SONN

Symbolic Objects K-Nearest Neighbour classifier

 

A short description 
Architecture of the system 
Experiment: “Mushroom data set”  
Experiment: “Dermatology data set”  
Experiment: Adult data set
Using SONN system
Related publications
Acknowledgments



A short description

SONN is a prototype system for classification of Symbolic Objects (SOs) by mean of K-NN algorithm.

The problem solved by SONN can be formally stated as follows:

Given

Classify each test example SO finding the associated class C'.

SOs are aggregated data described by the triple (Y,R,d) where:

There are two main kinds of SOs: Boolean and Probabilistic. In this last case, a probability distribution is associated to each description. SONN system works on both SOs type. The classical K-NN algorithm has been extended in order to be applied to a new kind of data. The most important features of the extended version vs. the classical one are: the automated selection of optimal K on the basis of cross-validation, the local distance weighted, the use of non-eucliedean dissimilarity measures between SOs and the output, for each test example, of the list of all classes with an associated probability (a symbolic modal variable) instead of the single class value.

 

Top of this page  


Architecture of the system

SONN architecture is very simple like its working. It is a wizard application to guide the user in selecting all the parameters needed for the classification. At the beginning the user selects the input file containing the SOs, then selects the class, the symbolic variables of interest for the classification, the percentage of training examples the system chooses randomly among the all SOs, the number of folders to find the optimal K and, finally, the dissimilarity measure to evaluate the K nearest neighbours training examples.

Top of this page  


Experiment: “Mushroom data set”

The problem is to classify different mushrooms family in two categories: poisonous and not.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Dermatology data set”

The problem is to classify groups of patients according skin deseases.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Adult data set”

The problem is to classify groups individuals according two bands of income.
There are 2 experiments, with different types of inputs:

Top of this page  


Using SONN system

SONN.exe. The SONN system

DissDLL.dll The library of dissimilarity measures

To use SONN system, download the executable file and the library in the same folder. Then start the system choosing one of the downloaded input file.

Warning: SONN system is free for evaluation, research and teaching purposes, but not for commercial purposes.
Please Acknowledge

Top of this page  


Related publications

  • F. Esposito, D. Malerba, & V. Tamma (2000). Dissimilarity Measures for Symbolic Objects. Chapter 8.3 in in H.-H. Bock and E. Diday (Eds.), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data, Series: Studies in Classification, Data Analysis, and Knowledge Organization, vol. 15, Springer-Verlag:Berlin, 165-185.

  • D. Malerba, F. Esposito, V. Gioviale & V. Tamma (2001). Comparing dissimilarity measures in Symbolic Data Analysis. Proceedings of the Joint Conferences on "New Techniques and Technologies for Statistcs" and "Exchange of Technology and Know-how" (ETK-NTTS'01), 473-481.

  • D. Malerba, F. Esposito, M. Monopoli (2002). Comparing dissimilarity measures for probabilistic symbolic objects. In A. Zanasi, C. A. Brebbia, N.F.F. Ebecken, P. Melli (Eds.) Data Mining III, Series Management Information Systems, Vol 6, 31-40, WIT Press, Southampton, UK.

  • C. D'Amato, D. Malerba, F. Esposito, M. Monopoli (2003). Extending the K-Nearest Neighbour classification algorithm to symbolic objects. Convegno Scientifico Intermedio SIS, 9-11 Giugno 2003, Università degli Studi di Napoli "Federico II".

Top of this page



Acknowledgments

SONN system has been implemented within the context of the following projects:
  • IST-2000-25161 project ASSO "Analysis System of Symbolic Official data", 2001-2003.
  • COFIN 2001 project "Metodi di estrazione, di validazione e di rappresentazione dell'informazione statistica in un contesto decisionale" (Methods of knowledge discovery, validation and representation of the statistical information in decision tasks), 2002-2003

    Top of this page



    Send all requests/comments to: Donato Malerba.

    Last update: July 22th, 2003