Ayuda
Ir al contenido

Dialnet


Resumen de Advances in document layout analysis

Vicente Bosch Campos

  • Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries.

    This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase.

    Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the baseline.

    One of the main contributions of this thesis, is that the proposed modelling approach allows us to include prior information regarding the layout of the documents being processed. This is performed via a Vertical Layout Model (VLM).

    We develop a Hidden Markov Model (HMM) based framework to tackle both region detection and classification as an integrated task and study the performance and ease of use of the proposed approach in many corpora. We review the modelling simplicity of our approach to process regions at different levels of information: text lines, paragraphs, titles, etc. We study the impact of adding deterministic and/or probabilistic prior information and restrictions via the VLM that our approach provides.

    Having a separate phase that accurately yields the detection position (base- lines in the case of text lines) of each region greatly simplifies the problem that must be tackled during the extraction phase. In this thesis we propose to use a distance map that takes into consideration the grey-scale information in the image. This allows us to yield extraction frontiers which are equidistant to the adjacent text regions. We study how our approach escalates its accuracy proportionally to the quality of the provided detection vertical position. Our extraction approach gives near perfect results when human reviewed baselines are provided.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus