A Set of Benchmarks for Handwritten Text Recognition on Historical Documents
[EN] Handwritten Text Recognition is a important requirement in order to make visible the contents of the myriads of historical documents residing in public and private archives and libraries world wide. Automatic Handwritten Text Recognition (HTR) is a challenging problem that requires a careful co...
| Autores: | , , , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2019 |
| País: | España |
| Institución: | Universitat Politècnica de València (UPV) |
| Repositorio: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia |
| Idioma: | inglés |
| OAI Identifier: | oai:riunet.upv.es:10251/156430 |
| Acceso en línea: | https://riunet.upv.es/handle/10251/156430 |
| Access Level: | acceso abierto |
| Palabra clave: | Historical handwritten text recognition Hidden Markov models Convolutional neural networks Recurrent neural networks Language modeling ESTADISTICA E INVESTIGACION OPERATIVA LENGUAJES Y SISTEMAS INFORMATICOS |
| Sumario: | [EN] Handwritten Text Recognition is a important requirement in order to make visible the contents of the myriads of historical documents residing in public and private archives and libraries world wide. Automatic Handwritten Text Recognition (HTR) is a challenging problem that requires a careful combination of several advanced Pattern Recognition techniques, including but not limited to Image Processing, Document Image Analysis, Feature Extraction, Neural Network approaches and Language Modeling. The progress of this kind of systems is strongly bound by the availability of adequate benchmarking datasets, software tools and reproducible results achieved using the corresponding tools and datasets. Based on English and German historical documents proposed in recent open competitions at ICDAR and ICFHR conferences between 2014 and 2017, this paper introduces four HTR benchmarks in order of increasing complexity from several points of view. For each benchmark, a specific system is proposed which overcomes results published so far under comparable conditions. Therefore, this paper establishes new state of the art baseline systems and results which aim at becoming new challenges that would hopefully drive further improvement of HTR technologies. Both the datasets and the software tools used to implement the baseline systems are made freely accessible for research purposes. (C) 2019 Elsevier Ltd. All rights reserved. |
|---|