Kdde - User contributions [en]
http://www.di.uniba.it/~kdde/index.php/Special:Contributions/AnnaCiampi
From KddeenMediaWiki 1.15.1Sat, 12 Jul 2014 23:01:42 GMTSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: </p>
<hr />
<div><br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Donato Malerba, Annalisa Appice, Angelo Muolo: '''Trend Cluster based Interpolation Everywhere in a Sensor Network'''. 27th Symposium On Applied Computing, ACMSAC 2012<br />
<br />
Annalisa Appice, Anna Ciampi, Angelo Muolo, Antonietta Lanza, Lorenzo Longo, Donato Malerba: '''Fault Diagnosis in Smart Grid of PhotoVoltaic Plants'''. Next Generation Data Mining Summit: Ubiquitous Knowledge Discovery for Energy Management in Smart Grids and Intelligent Machine-to-Machine (M2M) Telematics (NGDM11)<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: '''Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network'''. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi and Annalisa Appice and Donato Malerba and Angelo Muolo: '''An Intelligent System for Real Time Fault Detection in PV Plants'''. Sustainability in Energy and Buildings, SEB'11 Marseilles<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: '''Trend cluster based compression of geographically distributed data streams'''. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: '''Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes'''. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: '''Summarization for Geographically Distributed Data Streams'''. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: '''Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams'''. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: '''Discovering Trend-Based Clusters in Spatially Distributed Data Streams'''. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: '''Clustering Spatio-Temporal Data Streams''' [[Media:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 14:14:46 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Donato Malerba, Annalisa Appice, Angelo Muolo: '''Trend Cluster based Interpolation Everywhere in a Sensor Network'''. 27th Symposium On Applied Computing, ACMSAC 2012<br />
<br />
Annalisa Appice, Anna Ciampi, Angelo Muolo, Antonietta Lanza, Lorenzo Longo, Donato Malerba: '''Fault Diagnosis in Smart Grid of PhotoVoltaic Plants'''. Next Generation Data Mining Summit: Ubiquitous Knowledge Discovery for Energy Management in Smart Grids and Intelligent Machine-to-Machine (M2M) Telematics (NGDM11)<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: '''Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network'''. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi and Annalisa Appice and Donato Malerba and Angelo Muolo: '''An Intelligent System for Real Time Fault Detection in PV Plants'''. Sustainability in Energy and Buildings, SEB'11 Marseilles<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: '''Trend cluster based compression of geographically distributed data streams'''. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: '''Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes'''. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: '''Summarization for Geographically Distributed Data Streams'''. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: '''Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams'''. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: '''Discovering Trend-Based Clusters in Spatially Distributed Data Streams'''. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: '''Clustering Spatio-Temporal Data Streams''' [[Media:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 14:05:11 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Annalisa Appice, Anna Ciampi, Angelo Muolo, Antonietta Lanza, Lorenzo Longo, Donato Malerba, Fault Diagnosis in Smart Grid of PhotoVoltaic Plants. Next Generation Data Mining Summit: Ubiquitous Knowledge Discovery for Energy Management in Smart Grids and Intelligent Machine-to-Machine (M2M) Telematics (NGDM11)<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi and Annalisa Appice and Donato Malerba and Angelo Muolo: An Intelligent System for Real Time Fault Detection in PV Plants. Sustainability in Energy and Buildings, SEB'11 Marseilles<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams [[Media:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 14:02:20 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi and Annalisa Appice and Donato Malerba and Angelo Muolo: An Intelligent System for Real Time Fault Detection in PV Plants. Sustainability in Energy and Buildings, SEB'11 Marseilles<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams [[Media:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:59:53 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams [[Media:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:54:46 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams[[File:MSMMUSE10.pdf]]. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams [[File:MUSE10.pdf]]. Mining Ubiquitous and Social Environments (MUSE 2010)<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams [[File:Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:45:10 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE)2010<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams [[File:Sebd10.pdf|Sebd10.pdf]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:43:24 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE)2010<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: [[File:Sebd10.pdf|Clustering Spatio-Temporal Data Streams]]. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:36:09 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:Sebd10.pdf
http://www.di.uniba.it/~kdde/index.php/File:Sebd10.pdf
<p>AnnaCiampi: </p>
<hr />
<div></div>Wed, 18 Jan 2012 13:35:19 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sebd10.pdfSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Related publications */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
Pietro Guccione, Anna Ciampi, Annalisa Appice, Donato Malerba and Angelo Muolo: Spatio-Temporal Reconstruction of Un-Sampled Data in a Sensor Network. Mining Ubiquitous and Social Environments (MUSE) 2011<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Pietro Guccione: Trend cluster based compression of geographically distributed data streams. CIDM 2011: 168-175<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Angelo Muolo: Space-Time Roll-up and Drill-down into Geo-Trend Stream Cubes. ISMIS 2011: 365-375<br />
<br />
Anna Ciampi, Annalisa Appice, and Donato Malerba: Discovering Trend-Based Clusters in Spatially Distributed Data Streams. Mining Ubiquitous and Social Environments (MUSE)2010<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Summarization for Geographically Distributed Data Streams. KES (3) 2010: 339-348<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba: Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. MSM/MUSE 2010: 142-161<br />
<br />
Anna Ciampi, Annalisa Appice, Donato Malerba, Giuseppe Saponaro, Domenico Triglione: Clustering Spatio-Temporal Data Streams. SEBD 2010: 230-241</div>Wed, 18 Jan 2012 13:26:48 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: </p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[Image:SumatraN.jpg|1005px|SUMATRA]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 13:20:10 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Architecture */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[File:SumatraN.jpg]]<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:21:25 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:SumatraN.jpg
http://www.di.uniba.it/~kdde/index.php/File:SumatraN.jpg
<p>AnnaCiampi: </p>
<hr />
<div></div>Wed, 18 Jan 2012 12:20:48 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:SumatraN.jpgFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg"</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 12:19:37 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Architecture */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:19:03 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* Architecture */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:18:49 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:Src.jar|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:12:30 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:srcCode|Src.jar]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:12:09 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:Src.jar
http://www.di.uniba.it/~kdde/index.php/File:Src.jar
<p>AnnaCiampi: </p>
<hr />
<div></div>Wed, 18 Jan 2012 12:11:30 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Src.jarSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the source code ([[File:srcCode|src.zip]])<br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 12:06:49 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:Sumatra.jar
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jar
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jar"</p>
<hr />
<div>The runnable file of the framework.</div>Wed, 18 Jan 2012 12:02:55 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jarFile:Db.txt
http://www.di.uniba.it/~kdde/index.php/File:Db.txt
<p>AnnaCiampi: </p>
<hr />
<div></div>Wed, 18 Jan 2012 11:51:52 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Db.txtSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[File:db.txt]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 11:51:34 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:ClusteringParameters.txt
http://www.di.uniba.it/~kdde/index.php/File:ClusteringParameters.txt
<p>AnnaCiampi: uploaded a new version of "File:ClusteringParameters.txt"</p>
<hr />
<div></div>Wed, 18 Jan 2012 11:44:26 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:ClusteringParameters.txtSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: </p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 11:43:27 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: </p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:Sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, <br />
''Summarization for geographically distributed data streams'', <br />
in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, <br />
KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Wed, 18 Jan 2012 11:43:08 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg"</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 11:42:42 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg":&#32;Reverted to version as of 11:39, 18 January 2012</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 11:42:05 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg"</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 11:40:58 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg":&#32;Reverted to version as of 16:37, 3 December 2010</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 11:40:27 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgFile:Sumatra.jpg
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jpg
<p>AnnaCiampi: uploaded a new version of "File:Sumatra.jpg"</p>
<hr />
<div>SUMATRA Architecture</div>Wed, 18 Jan 2012 11:39:28 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jpgSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodeB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:20:22 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:nodesB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:19:03 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds], [[File:NodesB.csv]]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:18:48 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:NodeB.csv
http://www.di.uniba.it/~kdde/index.php/File:NodeB.csv
<p>AnnaCiampi: </p>
<hr />
<div></div>Tue, 18 Jan 2011 14:18:09 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:NodeB.csvSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Download <br />
**the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, <br />
**the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and <br />
**the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:16:26 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]), which require an xml input file of parameters ([[File:BuildingParameters.txt]]), to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:14:59 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:BuildingParameters.txt
http://www.di.uniba.it/~kdde/index.php/File:BuildingParameters.txt
<p>AnnaCiampi: </p>
<hr />
<div></div>Tue, 18 Jan 2011 14:14:19 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:BuildingParameters.txtSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System, which require an xml input file of parameters ([[File:ClusteringParameters.txt]]), and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:12:51 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:ClusteringParameters.txt
http://www.di.uniba.it/~kdde/index.php/File:ClusteringParameters.txt
<p>AnnaCiampi: </p>
<hr />
<div></div>Tue, 18 Jan 2011 14:09:39 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:ClusteringParameters.txtSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureSA.gds.zip temperatureSA.gds], [[File:NodesSA.csv]], [[File:AirclimateBuildingParameters.xml]], [[File:AirclimateClusteringParameters.xml]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:06:47 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([[File:TemperatureSA.gds]], [[File:NodesSA.csv]], [[File:AirclimateBuildingParameters.xml]], [[File:AirclimateClusteringParameters.xml]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:03:22 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] ([http://www.di.uniba.it/~ciampi/Dataset/temperatureB.gds.zip|temperatureB.gds], [http://www.di.uniba.it/~ciampi/Dataset/humidityB.gds.zip|humidityB.gds]).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([[File:TemperatureSA.gds]], [[File:NodesSA.csv]], [[File:AirclimateBuildingParameters.xml]], [[File:AirclimateClusteringParameters.xml]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 14:02:13 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRASUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package ([[File:sumatra.jar|sumatra.jar]]) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] (temperatureB.gds, humidityB.gds).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([[File:TemperatureSA.gds]], [[File:NodesSA.csv]], [[File:AirclimateBuildingParameters.xml]], [[File:AirclimateClusteringParameters.xml]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Tue, 18 Jan 2011 13:52:33 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:Sumatra.jar
http://www.di.uniba.it/~kdde/index.php/File:Sumatra.jar
<p>AnnaCiampi: The runnable file of the framework.</p>
<hr />
<div>The runnable file of the framework.</div>Tue, 18 Jan 2011 13:51:38 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Sumatra.jarFile:Dwscript.txt
http://www.di.uniba.it/~kdde/index.php/File:Dwscript.txt
<p>AnnaCiampi: uploaded a new version of "File:Dwscript.txt"</p>
<hr />
<div></div>Wed, 15 Dec 2010 11:04:09 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:Dwscript.txtSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package (sumatra.jar) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] (temperatureB.gds, humidityB.gds).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] ([[File:TemperatureSA.gds]], [[File:NodesSA.csv]], [[File:AirclimateBuildingParameters.xml]], [[File:AirclimateClusteringParameters.xml]]).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Mon, 06 Dec 2010 15:14:08 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:NodesSA.csv
http://www.di.uniba.it/~kdde/index.php/File:NodesSA.csv
<p>AnnaCiampi: </p>
<hr />
<div></div>Mon, 06 Dec 2010 15:12:13 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:NodesSA.csvSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package ([[File:BuildNetwork.jar|BuildNetwork.jar]]) to build the graph, the runnable distribution package (sumatra.jar) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] (temperatureB.gds, humidityB.gds).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] (temperatureSA.gds).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Mon, 06 Dec 2010 15:08:12 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRAFile:BuildNetwork.jar
http://www.di.uniba.it/~kdde/index.php/File:BuildNetwork.jar
<p>AnnaCiampi: </p>
<hr />
<div></div>Mon, 06 Dec 2010 15:06:59 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/File_talk:BuildNetwork.jarSUMATRA
http://www.di.uniba.it/~kdde/index.php/SUMATRA
<p>AnnaCiampi: /* The distribution package */</p>
<hr />
<div>{{stub}}<br />
==SUMATRA==<br />
The scenario is that of a '''geographically distributed data stream''' (e.g., sensed data) '''D''' where sets of observations for a ''numeric'' dimension are transmitted equally spaced in time from a (variable) number of georeferenced sources (e.g., sensors). <br />
<br />
The scope of is to find a compact and accurate representation '''P''' of '''D''' such that '''P''' is stored into a data warehouse '''DW''' in place of '''D''' and '''D''' can be accurately predicted by means of '''P'''. <br />
<br />
===A short description===<br />
<br />
SUMATRA ('''SUM'''m'''A'''rization by '''TR'''end cluster discovery '''A'''lgorithm) is a summarization technique which segments a geographically distributed data stream into consecutive equal sized windows. Each time a new window is completed, SUMATRA summarizes the window, stores the summaries into a data warehouse (''roll-up'') and discards the window. <br />
<br />
<br />
As summaries, SUMATRA computes a new kind of spatio-temporal patterns, called ''trend clusters''. These patterns are ''spatial clusters'' of sources which transmit values, whose temporal variation, called ''trend polyline'', is similar over the ''time horizon'' of the window. This trend polyline is drawn as the sequence of straight-line segments which fit clustered values as they are transmitted at equally spaced time points of the window. <br />
<br />
<br />
Several mathematical techniques (trend based sampling) and signal compression techniques (Discrete Fourier Transform or Haar wavelet) are integrated in SUMATRA in order to derive a compact representation of trend polylines. After storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream.<br />
<br />
<br />
After the storage into data warehouse, a trend cluster can be retrieved and used to approximately reconstruct the corresponding summarized portion of data stream (''drill-down'').<br />
<br />
[[File:sumatra.jpg]]<br />
<br />
===Architecture===<br />
<br />
A buffer consumes snapshots as they arrive equally spaced in time and pours them window-by-window into SUMATRA. <br />
<br />
After a window goes through SUMATRA, the window is definitely<br />
discarded, while a compact representation of discovered trend clusters is stored into a data warehouse. <br />
<br />
This way, the summarization process is three-stepped:<br />
<br />
*snapshots which compose a window are buffered into the data synopsis;<br />
*trend clusters are computed;<br />
*the window is discarded form data synopsis, while a compact representation of trend clusters is computed and stored into the data warehouse.<br />
<br />
Input parameters of trend cluster discovery in SUMATRA are the number of snapshots which compose a window $w$ ($w>1$) and the domain similarity threshold $\delta$ ($\delta$ in $[0,1]$). <br />
Input parameters of the polyline compression are either the error threshold $\epsilon$ or the compression degree threshold $\sigma$.<br />
<br />
===The distribution package===<br />
SUMATRA is provided as a .jar file and can be executed on every machine with a JVM (1.6) an MySQL (5 or higher) installed.<br />
<br />
*Read the (readme.txt) file for the installation<br />
*Download the runnable distribution package (buildNetwork.jar) to build the graph, the runnable distribution package (sumatra.jar) of the SUMATRA System and the sql script for data warehouse ([[MEDIA:dwscript.txt|dwscript.sql]]).<br />
*Geographically distributed data streams<br />
**[http://db.csail.mit.edu/labdata/labdata.html Berkley Intel Lab Data] (temperatureB.gds, humidityB.gds).<br />
**[http://climate.geog.udel.edu/~climate/html_pages/archive.html South American Air Climate] (temperatureSA.gds).<br />
<br />
Please, email the project team for instructions.<br />
<br />
Warning: SUMATRA is free for evaluation, research and teaching purposes, but not for commercial purposes.<br />
<br />
'''Please Acknowledge'''<br />
<br />
===Project Team===<br />
<br />
*[http://www.di.uniba.it/~ciampi/ Anna Ciampi] (aciampi@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~appice/ Annalisa Appice] (appice@di.uniba.it)<br />
<br />
*[http://www.di.uniba.it/~malerba/ Donato Malerba] (malerba@di.uniba.it)<br />
<br />
External research collaborators: <br />
<br />
*[http://dee.poliba.it/guccioneweb/index.html Pietro Guccione] (p.guccione@poliba.it)<br />
<br />
Students involved in the project: <br />
<br />
*Giuseppe Saponaro, Domenico Triglione<br />
<br />
<br />
<br />
===Related publications===<br />
<br />
<br />
A. Ciampi, A. Appice, and D. Malerba, ''Summarization for geographically<br />
distributed data streams'', in Proceedings of the 14th International<br />
Conference on Knowledge-Based and Intelligent Information and Engineering<br />
Systems, KES 2010, ser. LNCS, vol. 6278. Springer-Verlag, 2010, pp. 339–348.</div>Mon, 06 Dec 2010 14:27:56 GMTAnnaCiampihttp://www.di.uniba.it/~kdde/index.php/Talk:SUMATRA