MTA

MeSH Term Associator

 
LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari
 
A short description 
The functional architecture 
The distribution package 
Project team 
Related publications 


A short description

MTA is a data mining tool able to discover association rules on biomedical text corpora. It imports both some MeSH (Medical Subject Headings) taxonomies and a set of abstracts published on MedLine and discovers associations at different levels of abstraction (generalized association rules). Both automatic and semiautomatic approaches can be applied to structure the set of discovered rules and filter out uninteresting ones. In the automatic approach rules are filtered out without using user knowledge, while in the semiautomatic approach user domain knowledge is exploited to strongly guide the exploration of the set of discovered rules. Discovered association ruels can be imported/exported in PMML. Similarities between discovered association rules can be visually explored through a multidimensional analysis technique.

Top of this page  


The functional architecture

The architecture developed in the MTA context follows the standard KDD (Knowledge Discovery in Databases) process. It consists of the following steps:
  • Data Collection. MTA is integrated in a distributed framework which interfaces the PubMed remote database through the IBM Web Services for Life Sciences. A user query is directly run and the list of relevant abstracts is returned and downloaded.
  • Data Selection and Pre-processing. This step involves operations to prepare both data to be mined and data to be used as background knowledge.
    • Input data are composed by sets of abstracts of scientific publications returned by PubMed queries. Texts are annotated by the BioTeKS Text Analysis Engine (TAE) provided within the IBM UIM Architecture, by using a local MeSH terms dictionary. Then, feature selection techniques are used to choose relevant items (i.e., MeSHs). Each query generated a single table of a relational database, where each transaction corresponds to an individual abstract and attribues to selected MeSH terms.
    • Background knowledge is composed by MeSH hierarchies. Supported operations concern conversion of taxonomies in the MTA format, selection of portions of taxonomies of interest by means of pruning and recovering operations.
  • Data MiningThe mining step performs both flat and generalized association rule discovery among abstracts returned by a PubMed query. Discovered association rules capture recurrent patterns in texts that may detect relations among biomedical concepts.
  • Interpretation and Evaluation. Since the number of discovered association rules is usually high and the interest of most of them does not fulfil user expectations, some filtering and browsing techniques are available. There are four main criteria: rule templates, rule covers, statistical rating and specificity. The first one allows the end user to specify some knowledge of interest that rules should/should not match. The second ones select groups of redundat rules while the third one identifies statistically interesting rules. Finally, the last technique allows to look at the set of discovered rules as a set of subspaces of rules, where for each subspace a representative rule is identifiable.



  • A framework for MTA in PubMed query expansion tasks.

    Top of this page


    The distribution package

    MTA is an application running under Windows98 or higher.
    Download the distribution package (mta.zip, 60.5 MB) and unzip it into a temporary directory.
    Sample datasets are available in a MS Access database. MeSH taxonomies are stored in a separate MS Access database.
    See the User Guide for further details about system requirements, installation and usage of the system.

    Warning: The system MTA is free for evaluation, research and teaching purposes, but not for commercial purposes.

    Please Acknowledge

    Top of this page


    Project team

    Top of this page

    Related publications

    (in inverse chronological order) Top of this page
     


    berardi@di.uniba.it