MTA

MeSH Term Associator

LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari

A short description
The functional architecture
The distribution package
Project team
Related publications

A short description

MTA is a data mining tool able to discover association rules on biomedical text corpora. It imports both some MeSH (Medical Subject Headings) taxonomies and a set of abstracts published on MedLine and discovers associations at different levels of abstraction (generalized association rules). Both automatic and semiautomatic approaches can be applied to structure the set of discovered rules and filter out uninteresting ones. In the automatic approach rules are filtered out without using user knowledge, while in the semiautomatic approach user domain knowledge is exploited to strongly guide the exploration of the set of discovered rules. Discovered association ruels can be imported/exported in PMML. Similarities between discovered association rules can be visually explored through a multidimensional analysis technique.

Top of this page

The functional architecture

The architecture developed in the MTA context follows the standard KDD (Knowledge Discovery in Databases) process. It consists of the following steps:

Data Collection. MTA is integrated in a distributed framework which interfaces the PubMed remote database through the IBM Web Services for Life Sciences. A user query is directly run and the list of relevant abstracts is returned and downloaded.

Data Selection and Pre-processing. This step involves operations to prepare both data to be mined and data to be used as background knowledge.

Input data are composed by sets of abstracts of scientific publications returned by PubMed queries. Texts are annotated by the BioTeKS Text Analysis Engine (TAE) provided within the IBM UIM Architecture, by using a local MeSH terms dictionary. Then, feature selection techniques are used to choose relevant items (i.e., MeSHs). Each query generated a single table of a relational database, where each transaction corresponds to an individual abstract and attribues to selected MeSH terms.
Background knowledge is composed by MeSH hierarchies. Supported operations concern conversion of taxonomies in the MTA format, selection of portions of taxonomies of interest by means of pruning and recovering operations.

Data MiningThe mining step performs both flat and generalized association rule discovery among abstracts returned by a PubMed query. Discovered association rules capture recurrent patterns in texts that may detect relations among biomedical concepts.

Interpretation and Evaluation. Since the number of discovered association rules is usually high and the interest of most of them does not fulfil user expectations, some filtering and browsing techniques are available. There are four main criteria: rule templates, rule covers, statistical rating and specificity. The first one allows the end user to specify some knowledge of interest that rules should/should not match. The second ones select groups of redundat rules while the third one identifies statistically interesting rules. Finally, the last technique allows to look at the set of discovered rules as a set of subspaces of rules, where for each subspace a representative rule is identifiable.

A framework for MTA in PubMed query expansion tasks.

Top of this page

The distribution package

MTA is an application running under Windows98 or higher.
Download the distribution package (mta.zip, 60.5 MB) and unzip it into a temporary directory.
Sample datasets are available in a MS Access database. MeSH taxonomies are stored in a separate MS Access database.
See the User Guide for further details about system requirements, installation and usage of the system.

Warning: The system MTA is free for evaluation, research and teaching purposes, but not for commercial purposes.

Please Acknowledge

Top of this page

Project team

Project Leader

Prof. Donato Malerba

LACAM Staff

Margherita Berardi

Corrado Loglisci

Previous members

Saverio D'Alessandro

Top of this page

Related publications

(in inverse chronological order)

M. Berardi, A. Appice, C. Loglisci, P. Leo (2006). Supporting Visual Exploration of Discovered Association Rules Through Multi-Dimensional Scaling. Foundations of Intelligent Systems, 16th International Symposium, ISMIS 2006, Bari, Italia, Settembre 27-29, 2006, in F. Esposito, Z. W. Ras, D. Malerba, G. Semeraro (Eds.) Series: Lecture Notes in Computer Science 4203 Springer 2006, 369-378.

M. Berardi, M. Lapi, P. Leo, & C. Loglisci (2005). Mining Generalized Association Rules on Biomedical Literature. In: M. Ali, F. Esposito (Eds.): Innovations in Applied Artificial Intelligence, 18th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2005, Bari, Italy, June 22-24, 2005, Proceedings. Springer-Verlag, LNCS 3533, 500-509.

M. Berardi, D. Malerba, C. Marinelli, P. Leo, C. Loglisci, & G. Scioscia (2005). A Text-Mining application able to mine association rules from biomedical texts. Annual Meeting of the Bioinformatic Italian Society, BITS 2005. Milano, Marzo 17-19, 2005.

M. Berardi, M. Lapi, P. Leo, D. Malerba, C. Marinelli, & G. Scioscia (2004). A data mining approach to PubMed query refinement. 2nd International Workshop on Biological Data Management (BIDM 2004), in conjunction with DEXA 2004, Zaragoza, Spain, September 2, 2004, IEEE Computer Society, 401-405.

M. Berardi, M. Lapi, P. Leo, D. Malerba, C. Marinelli, & G. Scioscia (2004). A data mining approach for disease-genes relationship discovery in biomedical literature. KDNet Symposium on Knowledge-Based Services for the Public Sector: workshop on "Knowledge-based systems and services for health care". Bonn, Germany, June 3-4, 2004.

Top of this page

berardi@di.uniba.it