Multilingual adaptative text simplification

Sheang, Kim Cheng

Multilingual adaptative text simplification

Reading is an essential skill that plays a crucial role in our daily lives. It allows us to access information, gain knowledge, expand our understanding of the world around us, and build the foundation for learning, communication, and personal growth. However, many texts we encounter day after day o...

Descripción completa

Detalles Bibliográficos
Autor:	Sheang, Kim Cheng
Tipo de recurso:	tesis doctoral
Estado:	Versión publicada
Fecha de publicación:	2023
País:	España
Institución:	CBUC, CESCA
Repositorio:	TDR. Tesis Doctorales en Red
OAI Identifier:	oai:www.tdx.cat:10803/689317
Acceso en línea:	http://hdl.handle.net/10803/689317
Access Level:	acceso abierto
Palabra clave:	Adaptive text simplification Lexical simplification Sentence simplification Complex word identification Controllable lexical and sentence simplification Simplificació adaptativa de text Simplificació lèxica Simplificació de frases Identificació de paraules complexes Simplificació léxica i oracional controlable 62

id	ES_27abdfea3aa053f7972e841237e912b4
oai_identifier_str	oai:www.tdx.cat:10803/689317
network_acronym_str	ES
network_name_str	España
repository_id_str
dc.title.none.fl_str_mv	Multilingual adaptative text simplification
title	Multilingual adaptative text simplification
spellingShingle	Multilingual adaptative text simplification Sheang, Kim Cheng Adaptive text simplification Lexical simplification Sentence simplification Complex word identification Controllable lexical and sentence simplification Simplificació adaptativa de text Simplificació lèxica Simplificació de frases Identificació de paraules complexes Simplificació léxica i oracional controlable 62
title_short	Multilingual adaptative text simplification
title_full	Multilingual adaptative text simplification
title_fullStr	Multilingual adaptative text simplification
title_full_unstemmed	Multilingual adaptative text simplification
title_sort	Multilingual adaptative text simplification
dc.creator.none.fl_str_mv	Sheang, Kim Cheng
author	Sheang, Kim Cheng
author_facet	Sheang, Kim Cheng
author_role	author
dc.contributor.none.fl_str_mv	Saggion, Horacio Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
dc.subject.none.fl_str_mv	Adaptive text simplification Lexical simplification Sentence simplification Complex word identification Controllable lexical and sentence simplification Simplificació adaptativa de text Simplificació lèxica Simplificació de frases Identificació de paraules complexes Simplificació léxica i oracional controlable 62
topic	Adaptive text simplification Lexical simplification Sentence simplification Complex word identification Controllable lexical and sentence simplification Simplificació adaptativa de text Simplificació lèxica Simplificació de frases Identificació de paraules complexes Simplificació léxica i oracional controlable 62
description	Reading is an essential skill that plays a crucial role in our daily lives. It allows us to access information, gain knowledge, expand our understanding of the world around us, and build the foundation for learning, communication, and personal growth. However, many texts we encounter day after day often contain complex words or syntactic structures that can cause reading difficulties for certain groups of people; this motivates the need for Automatic Text Simplification (ATS). ATS is a Natural Language Processing (NLP) task that aims to reduce the linguistic complexity of a text while preserving its original information and meaning. It involves various operations, such as replacing complex words with simpler synonyms, splitting long sentences into shorter ones, and reorganizing the structure of the text. The goal of ATS is to make texts more accessible and understandable to a broader audience, including non-native speakers, children, and individuals with Dyslexia, Autism, Aphasia, Intellectual Disabilities, and Deaf and Hard of Hearing. In this work, we will discuss our proposed methods for Complex Word Identification (CWI), Lexical Simplification (LS), and Sentence Simplification (SS) in order to help improve reading comprehension. For CWI, we propose several systems based on different machine learning algorithms, such as Convolutional Neural Networks, CatBoost, and XGBoost with word embeddings and feature-engineered for identifying complex words in English, Spanish, German, and French texts. For LS, we propose two systems, monolingual English and multilingual system supporting English, Spanish, and Portuguese. For SS, we propose several systems to simplify English and Spanish texts. In both LS and SS, we explore the use of transfer learning and controllable mechanism, where the transfer learning help create the model that requires less amount of training data, and the controllable mechanism gives us the ability to adjust the outputs based on our preference, especially for different target audiences.
publishDate	2023
dc.date.none.fl_str_mv	2023 2023 2023
dc.type.none.fl_str_mv	info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/publishedVersion
format	doctoralThesis
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10803/689317
url	http://hdl.handle.net/10803/689317
dc.language.none.fl_str_mv	Inglés
language_invalid_str_mv	Inglés
dc.rights.none.fl_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	184 p. application/pdf
dc.publisher.none.fl_str_mv	Universitat Pompeu Fabra
publisher.none.fl_str_mv	Universitat Pompeu Fabra
dc.source.none.fl_str_mv	TDX (Tesis Doctorals en Xarxa) reponame:TDR. Tesis Doctorales en Red instname:CBUC, CESCA
instname_str	CBUC, CESCA
reponame_str	TDR. Tesis Doctorales en Red
collection	TDR. Tesis Doctorales en Red
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_	1869404900826284032
spelling	Multilingual adaptative text simplificationSheang, Kim ChengAdaptive text simplificationLexical simplificationSentence simplificationComplex word identificationControllable lexical and sentence simplificationSimplificació adaptativa de textSimplificació lèxicaSimplificació de frasesIdentificació de paraules complexesSimplificació léxica i oracional controlable62Reading is an essential skill that plays a crucial role in our daily lives. It allows us to access information, gain knowledge, expand our understanding of the world around us, and build the foundation for learning, communication, and personal growth. However, many texts we encounter day after day often contain complex words or syntactic structures that can cause reading difficulties for certain groups of people; this motivates the need for Automatic Text Simplification (ATS). ATS is a Natural Language Processing (NLP) task that aims to reduce the linguistic complexity of a text while preserving its original information and meaning. It involves various operations, such as replacing complex words with simpler synonyms, splitting long sentences into shorter ones, and reorganizing the structure of the text. The goal of ATS is to make texts more accessible and understandable to a broader audience, including non-native speakers, children, and individuals with Dyslexia, Autism, Aphasia, Intellectual Disabilities, and Deaf and Hard of Hearing. In this work, we will discuss our proposed methods for Complex Word Identification (CWI), Lexical Simplification (LS), and Sentence Simplification (SS) in order to help improve reading comprehension. For CWI, we propose several systems based on different machine learning algorithms, such as Convolutional Neural Networks, CatBoost, and XGBoost with word embeddings and feature-engineered for identifying complex words in English, Spanish, German, and French texts. For LS, we propose two systems, monolingual English and multilingual system supporting English, Spanish, and Portuguese. For SS, we propose several systems to simplify English and Spanish texts. In both LS and SS, we explore the use of transfer learning and controllable mechanism, where the transfer learning help create the model that requires less amount of training data, and the controllable mechanism gives us the ability to adjust the outputs based on our preference, especially for different target audiences.La lectura és una habilitat essencial que juga un paper crucial en la nostra vida quotidiana. La lectura ens permet accedir a la informació, adquirir coneixements, ampliar la nostra comprensió del món que ens envolta i construir les bases per a l'aprenentatge, la comunicació, i creixement personal. No obstant això, molts textos sovint contenen paraules complexes o estructures sintàctiques que poden provocar dificultats lectores per a determinats grups de persones; això motiva la necessitat de la simplificació automàtica de text (ATS). ATS es una tasca que pretén reduir la complexitat lingüística d'un text tot conservant la seva informació i significat originals. Implica diversos operacions, com ara substituir paraules complexes per sinònims més senzills, dividir les frases llargues en frases més curtes i reorganitzant l'estructura del text. L'objectiu d'ATS és fer que els textos siguin més accessibles i entenedors a un públic més ampli. En aquest treball, presentem nostra proposta de mètodes d'identificació de paraules complexes (CWI), simplificació lèxica (LS) i Simplificació de frases (SS) per tal de fer els textos més accessibles. Pel que fa la CWI, proposem diversos sistemes basats en algorismes d'aprenentatge automàtic, com ara xarxes neuronals de convolucions, “CatBoost” i “XGBoost” amb incrustacions de paraules i característiques dissenyades per identificar paraules complexes en anglès, espanyol, alemany i francès. Pel que fa la LS, proposem dos sistemes, un pel anglès i un multilingüe. Per a la SS, explorem l'ús de l'aprenentatge de transferència i el mecanismes de control, on l'aprenentatge de transferència ajuda a crear un model que requereix menys quantitat de dades d'entrenament mentre que el mecanisme de control ens dona la capacitat per ajustar les sortides en funció de la nostra preferència, especialment per a diferents públics objectiu.Programa de Doctorat en Tecnologies de la Informació i les ComunicacionsUniversitat Pompeu FabraSaggion, HoracioUniversitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions202320232023info:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/publishedVersion184 p.application/pdfhttp://hdl.handle.net/10803/689317TDX (Tesis Doctorals en Xarxa)reponame:TDR. Tesis Doctorales en Redinstname:CBUC, CESCAInglésL'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/http://creativecommons.org/licenses/by-nc-sa/4.0/info:eu-repo/semantics/openAccessoai:www.tdx.cat:10803/6893172026-06-14T12:46:07Z
score	15,300724

Multilingual adaptative text simplification

Similares en LA Referencia