Clustering Of Related Structured Objects

A short description 
Experimental results:
Topographic map interpretation 
Geo-referenced census data analysis 
The distribution package 
Related publications 
Authors & Aknowledgement 

A short description

CORSO is a spatial data mining method that is able to detect clusters of spatial data taking into account both their internal structure and the (spatial) links involving them. The resulting clusters are obtained grouping together homogeneous (i.e. structurally similar) spatial objects related each other according to one or more (spatial) relations defining a graph structure on data. To this aim, CORSO resorts to a multi-relational data mining approach to model homogeneity over relational structure embedded in spatial data and exploits the concept of graph neighborhood to capture relational constraints embedded in the graph edges. In addition, the method resorts to first-order formalism to represent the internal structure of data and evaluates similarities among spatial objects by calculating logical models (in ILP environment) of groups of objects.

More formally, the problem solved by CORSO is the following:
a set of structured objects O,
a background knowledge BK,
a binary relation R expressing links among objects in O,
a set of homogeneous clusters C, composed by objects in O, that is feasible with R.

Top of this page  

Topographic map interpretation

In this experiment, the topographic map of Ofanto zone (Apulia, Italy) is provided. The map is a grid of square cells, where each spatial object corresponds to a cell, and the adjacency relation among cells is considered to form the edges of discrete spatial structure. The internal spatial structure of each cell is described in terms of both the geometries and the semantics of entities included in the cell. An example of cell description is provided below:

An example of cell description in Ofanto dataset

The problem is that of identifying clusters including adjacent cells in the map such that geographical data inside each cluster properly models the spatial continuity of some morphological environment, while separate clusters model spatial variation over the entire space.
Experimental results clearly show that the granularity of partitioning changes by varying the homogeneity threshold, as shown in the following figure.
Graphical representation of some results obtained with CORSO by varying the homogeneity threshold

Top of this page  

Geo-referenced census data analysis

In this experiment, the goal is to perform a joint analysis of both socio-economic factors represented in census data and geographical factors represented in topographic maps in order to support a good public policy.
In this case, spatial objects are territorial units for which census data are collected as well as entities of geographical layers such as urban and wood areas, while discrete spatial structure over the data is defined by the adjacency relation.
Experimental results show that, when the homogeneity threshold is equal to 0.95, CORSO detects 163 clusters that, after a deep analysis, lead to three adjacent areas (namely C1, C2 and C3) represented in the following figure:

Visualisation of clusters detected by CORSO with a homogeneity threshold equal to 0.95

C1, C2 and C3 cover adjacent areas with quite similar range value for deprivation indexes but C1 models the presence of woods while C2 and C3 model the presence of small urban areas and large urban areas, respectively.

Top of this page  

The distribution package

CORSO is a Java application, but its usage requests the execution of ATRE learning system (developed on Windows o.s.); hence, the system only works on Microsoft Windows platforms.

Package Requirements Description
CORSO (2032 KB) O.S.: MS Windows 95/98/2000/ME/XP
Needed space on HD: 11.1 MB
J.R.E. version: 1.5.0 or later
This package includes two main components: a client part, named RGDBSCAN, and a server part, named ATRE4J.
The first one has been developed in Java and implements the CORSO algorithm; its name stems from the likenesses with respect to RGDBSCAN algorithm, even if CORSO is a completely different clustering method, by now.
The second component, in turn, is composed by a Java written wrapper and the ATRE learning system (executable on Windows platforms); this component deals with the clusters model calculation.
After downloading, unzip the file into a user chosen directory on HD and execute the 'corso.bat' batch file by DOS command line. This batch file automatically launch the ATRE server and the clustering client according to the provided command line arguments; a comprehensive help on program usage can be consulted by typing 'corso' (with no arguments) by DOS command line. Alternatively, user can manually launch both the ATRE server and the clustering client either by using the batch files in the corresponding subfolders (in this case, the 'rgdbscan.bat' file has to be modified to change the input parameters) or by DOS command line.

Warning: CORSO is free for evaluation, research and teaching purposes, but not for commercial purposes.
Please Acknowledge

Top of this page

Related publications

  • D. Malerba, A. Appice, A. Varlaro & A. Lanza (2005). Spatial Clustering of Structured Objects. Proceedings of 15th International Conference on Inductive Logic Programming (ILP 2005), Bonn, Germany.
  • A. Appice, A. Lanza & A. Varlaro (2005). Spatial Clustering of Related Structured Objects for Topographic Map Interpretation. Proceedings of the Workshop on Mining Spatio-Temporal Data (MSTD) in conjunction with ECML/PKDD 2005, 9-21, Porto, Portugal.
  • A. Varlaro, A. Appice, A. Lanza & D. Malerba (2005). An ILP Approach to Spatial Clustering, Convegno Italiano di Logica Computazionale (CILC 2005), Roma, Italy.
  • A. Varlaro, A. Appice, A. Lanza, D. Malerba & G. Guarnieri (2005). Relational Clustering with Discrete Spatial Structure, Proceedings of the 13th Italian Symposium on Advanced Database Systems (SEBD 2005), 149-160, Bressanone (Bolzano), Italy.
  • A. Lanza, F. Esposito, D. Malerba, A. Appice & A. Varlaro (2004). Knowledge Discovery from Maps for Environmental Protection. In H. Voss and M. Wachowicz (Eds.), Note of the KdNet Workshop Symposium KD for Environmental Management, Bonn, Germany.
  • F. Esposito, D. Malerba & G. Semeraro (1991). Flexible matching for noisy structural descriptions. In International Joint Conference on Artificial Intelligence, 658664.

    Top of this page

    Project team

    Project Leader

    Donato Malerba

    LACAM Staff

  • Dr. Annalisa APPICE
  • Antonio VARLARO
  • Prof. Antonietta LANZA

    Students involved in the project

  • Giuseppe Guarnieri
  • Antonio Fittipaldi


    Name Email address Tel. number Fax
    Annalisa Appice appice _AT_ +39 080 5443262 +39 080 5443262

    Top of this page