In
data mining and knowledge discovery, the expert is usually confronted
with a task of constructing a predictive model or a set of
descriptive patterns from the available data. In early approaches to
data mining, simple tabular data was used as input to data analysis.
Recent approaches, however, address much more complex data mining
tasks, including relational data mining, text mining and mining of
heterogeneous information networks. The availability of large amounts
of semantically annotated data in all domains of science, and biology
in particular, poses requirements for new data mining approaches
which need to deal with increased data complexity, the relational
character of semantic representations, as well as the reasoning
capacities of the underlying ontologies. This talk addresses semantic
data mining [1,2], an emerging data mining task in which semantic
data in the form of domain ontologies is used as background knowledge
in data analytics. The talk introduces a general framework for
semantic data mining, followed by the presentation of the SEGS method
for semantic subgroup discovery from microarray data [3,4], and two
general methods for semantic subgroup discovery SDM-SEGS and
SDM-Aleph [2] implemented as reusable workflows in the Orange4WS data
mining environment [5]. The use of described methods and tools is
illustrated on selected biomedical applications.
[1]
Nada Lavrač, Anže Vavpetič, Melanie Hilario, Alexandros Kalousis,
Agnieszka Lawrynowicz and Jedrzej Potoniec, “Tutorial on Semantic
Data Mining” at ECML/PKDD-2011, Athens, 9th September 2011,
http://semantic.cs.put.poznan.pl/SDM-tutorial2011/doku.php
[2]
A. Vavpetič and N. Lavrač, “Semantic Subgroup Discovery Systems
and Workflows in the SDM-Toolkit”, The Computer Journal, 2012; doi:
10.1093/comjnl/bxs057.
[3]
I. Trajkovski, N. Lavrac,
and J. Tolar, “SEGS: Search for Enriched Gene Sets in Microarray
Data, Journal of Biomedical Informatics, 2008b, 41(4), pp. 588–601.
[4]
V. Podpečan, N. Lavrac,
I. Mozetič et al., “SegMine
workflows for semantic microarray data analysis in Orange4WS”,
BMC Bioinformatics 2011, 12(416).
[5]
V. Podpečan, M. Zemenova, and N. Lavrač, “Orange4WS Environment
for Service-Oriented Data Mining”, The Computer Journal, 2012, 55,
pp. 82-98.