Multilingual adaptative text simplification

Reading is an essential skill that plays a crucial role in our daily lives. It allows us to access information, gain knowledge, expand our understanding of the world around us, and build the foundation for learning, communication, and personal growth. However, many texts we encounter day after day o...

Descripción completa

Detalles Bibliográficos
Autor: Sheang, Kim Cheng
Tipo de recurso: tesis doctoral
Estado:Versión publicada
Fecha de publicación:2023
País:España
Institución:CBUC, CESCA
Repositorio:TDR. Tesis Doctorales en Red
OAI Identifier:oai:www.tdx.cat:10803/689317
Acceso en línea:http://hdl.handle.net/10803/689317
Access Level:acceso abierto
Palabra clave:Adaptive text simplification
Lexical simplification
Sentence simplification
Complex word identification
Controllable lexical and sentence simplification
Simplificació adaptativa de text
Simplificació lèxica
Simplificació de frases
Identificació de paraules complexes
Simplificació léxica i oracional controlable
62
Descripción
Sumario:Reading is an essential skill that plays a crucial role in our daily lives. It allows us to access information, gain knowledge, expand our understanding of the world around us, and build the foundation for learning, communication, and personal growth. However, many texts we encounter day after day often contain complex words or syntactic structures that can cause reading difficulties for certain groups of people; this motivates the need for Automatic Text Simplification (ATS). ATS is a Natural Language Processing (NLP) task that aims to reduce the linguistic complexity of a text while preserving its original information and meaning. It involves various operations, such as replacing complex words with simpler synonyms, splitting long sentences into shorter ones, and reorganizing the structure of the text. The goal of ATS is to make texts more accessible and understandable to a broader audience, including non-native speakers, children, and individuals with Dyslexia, Autism, Aphasia, Intellectual Disabilities, and Deaf and Hard of Hearing. In this work, we will discuss our proposed methods for Complex Word Identification (CWI), Lexical Simplification (LS), and Sentence Simplification (SS) in order to help improve reading comprehension. For CWI, we propose several systems based on different machine learning algorithms, such as Convolutional Neural Networks, CatBoost, and XGBoost with word embeddings and feature-engineered for identifying complex words in English, Spanish, German, and French texts. For LS, we propose two systems, monolingual English and multilingual system supporting English, Spanish, and Portuguese. For SS, we propose several systems to simplify English and Spanish texts. In both LS and SS, we explore the use of transfer learning and controllable mechanism, where the transfer learning help create the model that requires less amount of training data, and the controllable mechanism gives us the ability to adjust the outputs based on our preference, especially for different target audiences.