WISDOM++

A WIse System for DOcument Management

 
LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari
 
A short description 
The functional architecture 
The distribution package 
Glossary 
Project team 
Related publications 
CORDIS Technology Marketplace 
FAQs


A short description

WISDOM++ is an intelligent document processing system that can transform paper documents into XML format.
Distinguishing features:
High adaptivity. WISDOM++ is a knowledge-based system capable of supplying assistance to the user during the document analysis and recognition process. The knowledge base, declaratively expressed in the form of decision trees or rules, is automatically built from a set of training documents using machine learning techniques.
Real-time user interaction. WISDOM++ has been designed as a multi-user system where each authorized user has his/her own rule base.
Multi-page document management. WISDOM++ processes the pages of a multi-page document independently of each other in all steps, since the optical scan function is able to work on a single page at a time. The sequence of pages in a multi-page document is defined by the user.

Top of this page  


The functional architecture

The document analysis and recognition process in WISDOM++ consists of the following steps:
  • Page acquisition. Each page is scanned with a resolution of 300 dpi and thresholded into a binary image. The bitmap of an A4-sized page takes 2,496*3,500=1,092,000 bytes and is stored in TIFF format.
  • Document analysis.
  • Document Classification and Understanding. The problem of finding the logical structure of a document can be cast as the problem of defining a mapping from the layout structure into the logical one. In WISDOM++, this mapping is limited to the association of a page with a document class (document classification) and the association of page layout components with basic logical components (document understanding). The mapping is built by matching the document description against both models of classes of documents and models of the logical components of interest for that class.
  • OCR. WISDOM++ allows the user to set up the text extraction process by selecting the logical components to which an OCR has to be applied.
  • Transformation into a web-accessible format. The web-accessible version (XML format) of the original document contains both text returned by the OCR and pictures extracted from the original bitmap and converted into JPEG format. Text and images are spatially arranged so that the XML reconstruction of the document is as faithful as possible to the original bitmap. Moreover the XML format maintains information extracted during the document understanding phase, since the Document Type Definition (DTD) is specialized for each class of documents in order to represent the specific logical structure.
  • Top of this page


    The distribution package

    WISDOM++ 2.0 is an application running under Windows98 or higher.
    Download the distribution package (wisdom++.zip, 59.5 MB) and unzip it into a temporary directory.
    Sample multi-page documents are also available. They all belong to a single class (tpami) of scientific papers. They have been completely processed and you can load their layout and logical structures from the database provided with this distribution package. The database also includes a set of learned rules used in the document classification and understanding processes. To view document images made available in TIFF format, please be sure that they are unzipped in the "Doc" folder of the user "root" defined in the database.
    See the User Guide for further details about system requirements, installation and usage of the system.

    Warning: The system WISDOM++ 2.0 is free for evaluation, research and teaching purposes, but not for commercial purposes.

    Please Acknowledge

    Top of this page


    Glossary

    Block: Rectangular area enclosing a portion of the document content.

    Frame: Rectangular area corresponding to a group of blocks.

    Layout structure: Structure which associates the content of a document with a hierarchy of layout objects (such as blocks, frames and pages). The leaves of the layout tree are the blocks, while the root represents the set of pages of the whole document. Intermediate nodes of the layout tree associated to each page may include several frames.

    Logical structure: Structure which associates the content of a document with a hierarchy of logical objects (such as sender/receiver of a business letter, title/authors of a scientific article, etc.).

    OCR (Optical Character Recognition):

    Skew angle: The orientation angle of the text baselines of a document image.

    Spread factor: factor computed by WISDOM++ and used to define some parameters of the segmentation algorithm. In simple documents with few sparse regions this ratio is greater than 1.0, while in complex documents with closely written text regions the ratio is lower than the unit.

    XML (eXtensible Marked-up Language): metalanguage used by WISDOM++ to export the results, stored in the database, of the whole document image analysis and recognition process.
     

    Top of this page


    Project team

    Top of this page

    Related publications

    (in inverse chronological order) Top of this page
     

    FAQs

    None yet available. Send all requests/comments to: Margherita Berardi,  Dipartimento di Informatica, Università degli Studi di Bari (Italy).

    Top of this page


    berardi@di.uniba.it