Summarization and Interpolation

SUMATRA and TRECI





 
A short description 
The distribution package 
Related publications 
Authors & Aknowledgement 
Contact 



A short description

Emerging real life applications, such as environmental compliance, ecological studies and meteorology, are characterized by real-time data acquisition through a number of (wireless) remote sensors. Operatively, remote sensors are installed across a spatially distributed network; they gather information along a number of attribute dimensions and periodically feed a central server with the measured data. The server is required to monitor these data, issue possible alarms or compute fast aggregates. As data analysis requests, which are submitted to a server, may concern both present and past data, the server is forced to store the entire stream. But, in the case of massive streams (large networks and/or frequent transmissions), the limited storage capacity of a server may impose to reduce the amount of data stored on the disk. One solution to address the storage limits is to compute summaries of the data as they arrive and use these summaries to interpolate the real data which are discarded instead. On any future demands of further analysis of the discarded data, the server pieces together the data from the summaries stored in database and processes them according to the requests.
In order to achieve this issue, we have designed a summarization technique, called SUMATRA, which segments the stream into windows, computes summaries window-by-window and stores these summaries in a database and an interpolation technique, called TRECI, which uses the inverse distance weighting approach to approximate observed data and to estimate missing data from trend clsuters. Trend clusters are discovered as summaries of each window. They are clusters of georeferenced data which vary according to a similar trend along the window time horizon.

The distribution package

Both SUMATRA and TRECI are implemented in a Java system. It iterfaces MySQL database to read the network structure (nodes and arcs).

jar Description
SUMATRA/TRECI This rar bundle contains (1) SumatraTreci.jar that allows us to :(i) perform the trend cluster discovery, in order to summarize a geophysical data stream and (ii) compute interpolations of the unknown data from the trend cluster summarization; (2) setup files and (3) a benchmark data steam (NDBC)

Warning: Both SUMATRA and TRECI are free for evaluation, research and teaching purposes, but not for commercial purposes.
Please Acknowledge
 

Top of this page


Related publications

Annalisa Appice, Anna Ciampi, Donato Malerba
Summarizing numeric spatial data streams by trend cluster discovery. Data Min. Knowl. Discov. 29(1): 84-136 (Online 2013, Printed 2015)

Annalisa Appice, Anna Ciampi, Donato Malerba, Pietro Guccione
Using trend clusters for spatiotemporal interpolation of missing data in a sensor network. J. Spatial Information Science 6(1): 119-153 (2013)

Annalisa Appice, Anna Ciampi, Fabio Fumarola, Donato Malerba
Data Mining Techniques in Sensor Networks - Summarization, Interpolation and Surveillance. Springer Briefs in Computer Science, Springer 2014, ISBN 978-1-4471-5453-2, pp. I-XIII, 1-105
 

Top of this page


Project team

  • Dr. Annalisa APPICE (SUMATRA, TRECI)
  • Dr. Anna CIAMPI (SUMATRA, TRECI)
  • Dr. Ing. Pietro GUCCIONE (TRECI)
  • Prof. Donato MALERBA (SUMATRA, TRECI)


    Contact

    Name Email address Tel. number Fax
    Annalisa Appice annalisa.appice@uniba.it +39 080 5443262 +39 080 5443262
     

    Top of this page