Layout analysis

The layout analysis in WISDOM++

The layout analysis denotes the process of perceptual organization that aims at detecting structures among blocks. The result is the layout structure of the document.
The layout analysis is performed in two steps:
The layout structure extracted by WISDOM++ is a hierarchy with six levels: basic blocks, lines, set of lines, frame1, frame2, and pages. If the user is not satisfied with the result of the layout analysis he can act directly on the results of the segmentation process by deleting some blocks or he can modify the result of the global analysis. Indeed, the layout analysis process, that WISDOM++ implements, has low performance when the document structure is much more variable due to the presence of formulas, images, and drawings. The main source of the errors, made by the layout analysis module, is in the global analysis step, while the local analysis step is performed satisfactorily when the result of the global analysis has been opportunely corrected. At this aim, the system applies also a set of corrective rules automatically learned by means of ATRE, the learning systems that WISDOM++ incorporates. These rules are induced from a set of examples describing both corrective operations performed by the user and the kind of layout on which they are applied. In particular, the user can correct the result of the global analysis by performing three different operations: After each operation, WISDOM++ recomputes the result of the local analysis process, so that the user can immediately perceive the final effect of the requested corrections and can decide whether to confirm or reject it. Once the user has completed the correction process, WISDOM++ has the description of when and how the user has modified the result of the global analysis, and generates corresponding training observations. The splitting operations can be described by means of a binary function, split(X,S), where X represents the column/section to be split, S is an ordinal number representing the step of the correction process, and the range of the split function is the set {horizontal, vertical, no_split}. The grouping operation can be described by a ternary predicate group(A,B,S), where A and B are the two grouped sections (columns) and S is the ordinal number representing the step of the correction process. The rule learning phase is run in a different moment. Some results about the learning of layout corrective rules are in Malerba et al., 2002 and details on the approch are in Berardi et al., 2003.

 

Bibliography
 

 

Back to home page


berardi@di.uniba.it