Clustering Of Related Structured Objects
A short description
CORSO is a spatial data mining method that is able to detect clusters of spatial data taking into account both their internal structure and the (spatial) links involving them. The resulting clusters are obtained grouping together homogeneous (i.e. structurally similar) spatial objects related each other according to one or more (spatial) relations defining a graph structure on data. To this aim, CORSO resorts to a multi-relational data mining approach to model homogeneity over relational structure embedded in spatial data and exploits the concept of graph neighborhood to capture relational constraints embedded in the graph edges. In addition, the method resorts to first-order formalism to represent the internal structure of data and evaluates similarities among spatial objects by calculating logical models (in ILP environment) of groups of objects.
More formally, the problem solved by CORSO is the following:
· a set of structured objects O,
· a background knowledge BK,
· a binary relation R expressing links among objects in O,
a set of homogeneous clusters C, composed by objects in O, that is feasible with R.
Topographic map interpretation
In this experiment, the topographic map of Ofanto zone (Apulia, Italy) is provided. The map is a grid of square cells, where each spatial object corresponds to a cell, and the adjacency relation among cells is considered to form the edges of discrete spatial structure. The internal spatial structure of each cell is described in terms of both the geometries and the semantics of entities included in the cell. An example of cell description is provided below:
The problem is that of identifying clusters including adjacent cells in the map such that geographical data inside each cluster properly models the spatial continuity of some morphological environment, while separate clusters model spatial variation over the entire space.
Experimental results clearly show that the granularity of partitioning changes by varying the homogeneity threshold, as shown in the following figure.
Geo-referenced census data analysis
In this experiment, the goal is to perform a joint analysis of both socio-economic factors represented in census data and geographical factors represented in topographic maps in order to support a good public policy.
In this case, spatial objects are territorial units for which census data are collected as well as entities of geographical layers such as urban and wood areas, while discrete spatial structure over the data is defined by the adjacency relation.
Experimental results show that, when the homogeneity threshold is equal to 0.95, CORSO detects 163 clusters that, after a deep analysis, lead to three adjacent areas (namely C1, C2 and C3) represented in the following figure:
C1, C2 and C3 cover adjacent areas with quite similar range value for deprivation indexes but C1 models the presence of woods while C2 and C3 model the presence of small urban areas and large urban areas, respectively.
The distribution package
CORSO is a Java application, but its usage requests the execution of ATRE learning system (developed on Windows o.s.); hence, the system only works on Microsoft Windows platforms.
|CORSO (2032 KB)
||O.S.: MS Windows 95/98/2000/ME/XP
Needed space on HD: 11.1 MB
J.R.E. version: 1.5.0 or later
|This package includes two main components: a client part, named RGDBSCAN, and a server part, named ATRE4J.
The first one has been developed in Java and implements the CORSO algorithm; its name stems from the likenesses with respect to RGDBSCAN algorithm, even if CORSO is a completely different clustering method, by now.
The second component, in turn, is composed by a Java written wrapper and the ATRE learning system (executable on Windows platforms); this component deals with the clusters model calculation.
After downloading, unzip the file into a user chosen directory on HD and execute the 'corso.bat' batch file by DOS command line.
This batch file automatically launch the ATRE server and the clustering client according to the provided command line arguments; a comprehensive help on program usage can be consulted by typing 'corso' (with no arguments) by DOS command line.
Alternatively, user can manually launch both the ATRE server and the clustering client either by using the batch files in the corresponding subfolders (in this case, the 'rgdbscan.bat' file has to be modified to change the input parameters) or by DOS command line.
CORSO is free for evaluation, research and teaching purposes, but not for commercial
D. Malerba, A. Appice, A. Varlaro & A. Lanza (2005). Spatial Clustering of Structured Objects. Proceedings of 15th International Conference on Inductive Logic Programming (ILP 2005), Bonn, Germany.
A. Appice, A. Lanza & A. Varlaro (2005). Spatial Clustering of Related Structured Objects for Topographic Map Interpretation. Proceedings of the Workshop on Mining Spatio-Temporal Data (MSTD) in conjunction with ECML/PKDD 2005, 9-21, Porto, Portugal.
A. Varlaro, A. Appice, A. Lanza & D. Malerba (2005). An ILP Approach to Spatial Clustering, Convegno Italiano di Logica Computazionale (CILC 2005), Roma, Italy.
A. Varlaro, A. Appice, A. Lanza, D. Malerba & G. Guarnieri (2005). Relational Clustering with Discrete Spatial Structure, Proceedings of the 13th Italian Symposium on Advanced Database Systems (SEBD 2005), 149-160, Bressanone (Bolzano), Italy.
A. Lanza, F. Esposito, D. Malerba, A. Appice & A. Varlaro (2004). Knowledge Discovery from Maps for Environmental Protection. In H. Voss and M. Wachowicz (Eds.), Note of the KdNet Workshop Symposium KD for Environmental Management, Bonn, Germany.
F. Esposito, D. Malerba & G. Semeraro (1991). Flexible matching for noisy structural descriptions. In International Joint Conference on Artificial Intelligence, 658–664.
Top of this page
Dr. Annalisa APPICE
Prof. Antonietta LANZA
Students involved in the project
||appice _AT_ di.uniba.it
||+39 080 5443262
||+39 080 5443262
Top of this page