A novel Spanish dataset for financial education text simplification targeting visually impaired individuals

Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especiall...

Descripción completa

Detalles Bibliográficos
Autores: Pérez-Rojas, Nelson, Calderón Ramírez, Saúl, Solís, Martín, Romero-Sandoval, Mario Alberto, Arias-Monge, Monica, Saggion, Horacio
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2025
País:España
Institución:Universitat Pompeu Fabra
Repositorio:Repositorio Digital de la UPF
OAI Identifier:oai:repositori.upf.edu:10230/72449
Acceso en línea:https://hdl.handle.net/10230/72449
http://dx.doi.org/10.1109/ACCESS.2025.3568693
Access Level:acceso abierto
Palabra clave:Automatic text simplification
Lexical simplification
Word complexity
Lexical complexity prediction
Descripción
Sumario:Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at https://huggingface.co/datasets/saul1917/FEINA.