Layout analysisThe layout analysis in WISDOM++
The layout analysis denotes the process of perceptual organization that aims at detecting structures among blocks. The result is the layout structure of the document.
The layout analysis is performed in two steps:- A global analysis of the document image in order to determine possible areas containing paragraphs, sections, columns, figures and tables. This step is based on an iterative process in which the vertical and horizontal histograms of text blocks are alternatively analyzed in order to detect columns and sections/paragraphs, respectively.
- A local analysis of the document aiming at grouping together blocks which possibly fall within the same area. Three perceptual criteria areconsidered in this step: Proximity (e.g., adjacent components belonging to the same column/area are equally spaced), continuity (e.g., overlapping components) and similarity (e.g., components of the same type withan almost equal height). Pairs of layout components that satisfy some ofthese criteria may be grouped together. Each layout component is associatedwith one of the following types: Text, horizontal line, vertical line, picture, graphic and mixed. When the constituent blocks of a logical componentare homogeneous, the same type is inherited by the logical component, otherwisethe associated type is set to mixed.
The layout structure extracted by WISDOM++ is a hierarchy with six levels: basic blocks, lines, set of lines, frame1, frame2, and pages. If the user is not satisfied with the result of the layout analysis he can act directly on the results of the segmentation process by deleting some blocks or he can modify the result of the global analysis. Indeed, the layout analysis process, that WISDOM++ implements, has low performance when the document structure is much more variable due to the presence of formulas, images, and drawings. The main source of the errors, made by the layout analysis module, is in the global analysis step, while the local analysis step is performed satisfactorily when the result of the global analysis has been opportunely corrected. At this aim, the system applies also a set of corrective rules automatically learned by means of ATRE, the learning systems that WISDOM++ incorporates. These rules are induced from a set of examples describing both corrective operations performed by the user and the kind of layout on which they are applied. In particular, the user can correct the result of the global analysis by performing three different operations:
-
a) Horizontal splitting of a column/section.
- b) Vertical splitting of a column/section.
- c) Grouping of two sections/columns into one.
After each operation, WISDOM++ recomputes the result of the local analysis process, so that the user can immediately perceive the final effect of the requested corrections and can decide whether to confirm or reject it. Once the user has completed the correction process, WISDOM++ has the description of when and how the user has modified the result of the global analysis, and generates corresponding training observations.
The splitting operations can be described by means of a binary function, split(X,S), where X represents the column/section to be split, S is an ordinal number representing the step of the correction process, and the range of the split function is the set {horizontal, vertical, no_split}. The grouping operation can be described by a ternary predicate group(A,B,S), where A and B are the two grouped sections (columns) and S is the ordinal number representing the step of the correction process. The rule learning phase is run in a different moment. Some results about the learning of layout corrective rules are in Malerba et al., 2002 and details on the approch are in Berardi et al., 2003.
Bibliography
-
D. Malerba, F. Esposito & O. Altamura (2002). Adaptive layout analysis of document images, in H.-S. Hacid, Z.W. Ras, D.A. Zighed & Y. Kodratoff (Eds.), Foundations of Intelligent Systems, 13th International Symposium, ISMIS'2002, Lecture Notes in Artificial Intelligence, 2366, 526-534, Springer, Berlin, Germany. (pdf)
-
M. Berardi, M. Ceci, F. Esposito & D. Malerba (2003). Learning Logic Programs for Layout Analysis Correction, Proceedings of the 20th International Conference on Machine Learning (ICML 2003), 27-34. (pdf)
Back to home page
berardi@di.uniba.it