The Corpus of Basque Simplified Texts (CBST)

In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator w...

Descripción completa

Detalles Bibliográficos
Autores: González Dios, Itziar, Aranzabe Urruzola, María Jesús, Díaz de Ilarraza Sánchez, Arantza
Tipo de recurso: artículo
Fecha de publicación:2018
País:España
Institución:Universidad del País Vasco
Repositorio:Addi. Archivo Digital para la Docencia y la Investigación
OAI Identifier:oai:addi.ehu.eus:10810/29667
Acceso en línea:http://hdl.handle.net/10810/29667
Access Level:acceso abierto
Palabra clave:text simplification
monolingual parallel corpora
annotation scheme
Basque
sentence complexity
Spanish
id ES_f0ca38c4e3a7b3cdcd268fe21f65512f
oai_identifier_str oai:addi.ehu.eus:10810/29667
network_acronym_str ES
network_name_str España
repository_id_str
spelling The Corpus of Basque Simplified Texts (CBST)González Dios, ItziarAranzabe Urruzola, María JesúsDíaz de Ilarraza Sánchez, Arantzatext simplificationmonolingual parallel corporaannotation schemeBasquesentence complexitySpanishIn this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.Cerrar texto de financiación Itziar Gonzalez-Dios's work was funded by a Ph.D. grant from the Basque Government and a postdoctoral grant for the new doctors from the Vice-rectory of Research of the University of the Basque Country (UPV/EHU). We are very grateful to the translator and teacher that simplified the texts. We also want to thank Dominique Brunato, Felice Dell'Orletta and Giulia Venturi for their help with the Italian annotation scheme and their suggestions when analysing the corpus and Oier Lopez de Lacalle for his help with the statistical analysis. We also want to express our gratitude to the anonymous reviewers for their comments and suggestions. This research was supported by the Basque Government (IT344-10), and the Spanish Ministry of Economy and Competitiveness, EXTRECM Project (TIN2013-46616-C2-1-R).Springer201820182018info:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10810/29667reponame:Addi. Archivo Digital para la Docencia y la Investigacióninstname:Universidad del País VascoInglésinfo:eu-repo/grantAgreement/MINECO/TIN2013-46616-C2-1-R/https://link.springer.com/article/10.1007%2Fs10579-017-9407-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/es/This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Atribución 3.0 Españaoai:addi.ehu.eus:10810/296672026-06-18T09:23:17Z
dc.title.none.fl_str_mv The Corpus of Basque Simplified Texts (CBST)
title The Corpus of Basque Simplified Texts (CBST)
spellingShingle The Corpus of Basque Simplified Texts (CBST)
González Dios, Itziar
text simplification
monolingual parallel corpora
annotation scheme
Basque
sentence complexity
Spanish
title_short The Corpus of Basque Simplified Texts (CBST)
title_full The Corpus of Basque Simplified Texts (CBST)
title_fullStr The Corpus of Basque Simplified Texts (CBST)
title_full_unstemmed The Corpus of Basque Simplified Texts (CBST)
title_sort The Corpus of Basque Simplified Texts (CBST)
dc.creator.none.fl_str_mv González Dios, Itziar
Aranzabe Urruzola, María Jesús
Díaz de Ilarraza Sánchez, Arantza
author González Dios, Itziar
author_facet González Dios, Itziar
Aranzabe Urruzola, María Jesús
Díaz de Ilarraza Sánchez, Arantza
author_role author
author2 Aranzabe Urruzola, María Jesús
Díaz de Ilarraza Sánchez, Arantza
author2_role author
author
dc.subject.none.fl_str_mv text simplification
monolingual parallel corpora
annotation scheme
Basque
sentence complexity
Spanish
topic text simplification
monolingual parallel corpora
annotation scheme
Basque
sentence complexity
Spanish
description In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
publishDate 2018
dc.date.none.fl_str_mv 2018
2018
2018
dc.type.none.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.none.fl_str_mv http://hdl.handle.net/10810/29667
url http://hdl.handle.net/10810/29667
dc.language.none.fl_str_mv Inglés
language_invalid_str_mv Inglés
dc.relation.none.fl_str_mv info:eu-repo/grantAgreement/MINECO/TIN2013-46616-C2-1-R/
https://link.springer.com/article/10.1007%2Fs10579-017-9407-6
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/3.0/es/
Atribución 3.0 España
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/3.0/es/
Atribución 3.0 España
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:Addi. Archivo Digital para la Docencia y la Investigación
instname:Universidad del País Vasco
instname_str Universidad del País Vasco
reponame_str Addi. Archivo Digital para la Docencia y la Investigación
collection Addi. Archivo Digital para la Docencia y la Investigación
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869424040831090688
score 15.300724