Invited Talk

Nada Lavrač-Department of Knowledge Technologies, Jozef Stefan Institute, Slovenia

"Advances in Semantic Data  Mining"

In data mining and knowledge discovery, the expert is usually confronted with a task of constructing a predictive model or a set of descriptive patterns from the available data. In early approaches to data mining, simple tabular data was used as input to data analysis. Recent approaches, however, address much more complex data mining tasks, including relational data mining, text mining and mining of heterogeneous information networks. The availability of large amounts of semantically annotated data in all domains of science, and biology in particular, poses requirements for new data mining approaches which need to deal with increased data complexity, the relational character of semantic representations, as well as the reasoning capacities of the underlying ontologies. This talk addresses semantic data mining [1,2], an emerging data mining task in which semantic data in the form of domain ontologies is used as background knowledge in data analytics. The talk introduces a general framework for semantic data mining, followed by the presentation of the SEGS method for semantic subgroup discovery from microarray data [3,4], and two general methods for semantic subgroup discovery SDM-SEGS and SDM-Aleph [2] implemented as reusable workflows in the Orange4WS data mining environment [5]. The use of described methods and tools is illustrated on selected biomedical applications.

[1] Nada Lavrač, Anže Vavpetič, Melanie Hilario, Alexandros Kalousis, Agnieszka Lawrynowicz and Jedrzej Potoniec, “Tutorial on Semantic Data Mining” at ECML/PKDD-2011, Athens, 9th September 2011,

[2] A. Vavpetič and N. Lavrač, “Semantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit”, The Computer Journal, 2012; doi: 10.1093/comjnl/bxs057.

[3] I. Trajkovski, N. Lavrac, and J. Tolar, “SEGS: Search for Enriched Gene Sets in Microarray Data, Journal of Biomedical Informatics, 2008b, 41(4), pp. 588–601.

[4] V. Podpečan, N. Lavrac, I. Mozetič et al., “SegMine workflows for semantic microarray data analysis in Orange4WS”, BMC Bioinformatics 2011, 12(416).

[5] V. Podpečan, M. Zemenova, and N. Lavrač, “Orange4WS Environment for Service-Oriented Data Mining”, The Computer Journal, 2012, 55, pp. 82-98.