HTSR-Pollen: Handwritten Text Synthesis and Recognition System to Overcome Data Scarcity

[EN] Offline Handwritten Text Recognition (HTR) systems recognize and transcribe handwritten text from scanned images into digital formats. The field has become important due to the need for document digitization and data entry automation in various industries. Accurate recognition requires large an...

Descripción completa

Detalles Bibliográficos
Autores: Neto, Arthur F. S., Bezerra, Byron L. D., Toselli, Alejandro Héctor
Tipo de recurso: artículo
Fecha de publicación:2026
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:dnet:riunet______::aa586329467f84e84ca0532747b05469
Acceso en línea:https://riunet.upv.es/handle/10251/234623
Access Level:acceso abierto
Palabra clave:Data augmentation
Data synthesis
Handwriting synthesis
Handwritten text recognition
Descripción
Sumario:[EN] Offline Handwritten Text Recognition (HTR) systems recognize and transcribe handwritten text from scanned images into digital formats. The field has become important due to the need for document digitization and data entry automation in various industries. Accurate recognition requires large and varied datasets for training optical models, where collecting and labeling these datasets is often time-consuming and impractical. To address this challenge, data augmentation and transfer learning are commonly used. Nevertheless, these traditional methods may lead to overfitting and performance degradation when data are scarce. This work proposes integrating Conditional Generative Adversarial Networks (CGANs) for data synthesis into optical model training to improve handwriting recognition in data-scarce scenarios. To validate our proposal, we conducted a study that included: (i) an exploration to establish an optimal configuration for traditional data augmentation; and (ii) extensive experiments using seven datasets. In addition, these datasets were partitioned into training subsets to simulate diverse data-scarcity conditions. Averaged over all subsets and optical models, data synthesis achieved the highest reductions compared to the baseline trained from scratch without augmentation. It reduced the Character Error Rate (CER) by 41.1% and the Word Error Rate (WER) by 28.1%. Transfer learning achieved reductions of 34.4% in CER and 23.5% in WER. Lastly, traditional data augmentation achieved reductions of 13.8% in CER and 12.3% in WER. These findings highlight the importance of data synthesis for improving HTR systems, particularly in data-scarcity contexts.