The Corpus of Basque Simplified Texts (CBST)
In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator w...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2018 |
| País: | España |
| Institución: | Universidad del País Vasco |
| Repositorio: | Addi. Archivo Digital para la Docencia y la Investigación |
| OAI Identifier: | oai:addi.ehu.eus:10810/29667 |
| Acceso en línea: | http://hdl.handle.net/10810/29667 |
| Access Level: | acceso abierto |
| Palabra clave: | text simplification monolingual parallel corpora annotation scheme Basque sentence complexity Spanish |
| id |
ES_f0ca38c4e3a7b3cdcd268fe21f65512f |
|---|---|
| oai_identifier_str |
oai:addi.ehu.eus:10810/29667 |
| network_acronym_str |
ES |
| network_name_str |
España |
| repository_id_str |
|
| spelling |
The Corpus of Basque Simplified Texts (CBST)González Dios, ItziarAranzabe Urruzola, María JesúsDíaz de Ilarraza Sánchez, Arantzatext simplificationmonolingual parallel corporaannotation schemeBasquesentence complexitySpanishIn this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.Cerrar texto de financiación Itziar Gonzalez-Dios's work was funded by a Ph.D. grant from the Basque Government and a postdoctoral grant for the new doctors from the Vice-rectory of Research of the University of the Basque Country (UPV/EHU). We are very grateful to the translator and teacher that simplified the texts. We also want to thank Dominique Brunato, Felice Dell'Orletta and Giulia Venturi for their help with the Italian annotation scheme and their suggestions when analysing the corpus and Oier Lopez de Lacalle for his help with the statistical analysis. We also want to express our gratitude to the anonymous reviewers for their comments and suggestions. This research was supported by the Basque Government (IT344-10), and the Spanish Ministry of Economy and Competitiveness, EXTRECM Project (TIN2013-46616-C2-1-R).Springer201820182018info:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10810/29667reponame:Addi. Archivo Digital para la Docencia y la Investigacióninstname:Universidad del País VascoInglésinfo:eu-repo/grantAgreement/MINECO/TIN2013-46616-C2-1-R/https://link.springer.com/article/10.1007%2Fs10579-017-9407-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/es/This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Atribución 3.0 Españaoai:addi.ehu.eus:10810/296672026-06-18T09:23:17Z |
| dc.title.none.fl_str_mv |
The Corpus of Basque Simplified Texts (CBST) |
| title |
The Corpus of Basque Simplified Texts (CBST) |
| spellingShingle |
The Corpus of Basque Simplified Texts (CBST) González Dios, Itziar text simplification monolingual parallel corpora annotation scheme Basque sentence complexity Spanish |
| title_short |
The Corpus of Basque Simplified Texts (CBST) |
| title_full |
The Corpus of Basque Simplified Texts (CBST) |
| title_fullStr |
The Corpus of Basque Simplified Texts (CBST) |
| title_full_unstemmed |
The Corpus of Basque Simplified Texts (CBST) |
| title_sort |
The Corpus of Basque Simplified Texts (CBST) |
| dc.creator.none.fl_str_mv |
González Dios, Itziar Aranzabe Urruzola, María Jesús Díaz de Ilarraza Sánchez, Arantza |
| author |
González Dios, Itziar |
| author_facet |
González Dios, Itziar Aranzabe Urruzola, María Jesús Díaz de Ilarraza Sánchez, Arantza |
| author_role |
author |
| author2 |
Aranzabe Urruzola, María Jesús Díaz de Ilarraza Sánchez, Arantza |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
text simplification monolingual parallel corpora annotation scheme Basque sentence complexity Spanish |
| topic |
text simplification monolingual parallel corpora annotation scheme Basque sentence complexity Spanish |
| description |
In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque. |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018 2018 2018 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/10810/29667 |
| url |
http://hdl.handle.net/10810/29667 |
| dc.language.none.fl_str_mv |
Inglés |
| language_invalid_str_mv |
Inglés |
| dc.relation.none.fl_str_mv |
info:eu-repo/grantAgreement/MINECO/TIN2013-46616-C2-1-R/ https://link.springer.com/article/10.1007%2Fs10579-017-9407-6 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/3.0/es/ Atribución 3.0 España |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/es/ Atribución 3.0 España |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Springer |
| publisher.none.fl_str_mv |
Springer |
| dc.source.none.fl_str_mv |
reponame:Addi. Archivo Digital para la Docencia y la Investigación instname:Universidad del País Vasco |
| instname_str |
Universidad del País Vasco |
| reponame_str |
Addi. Archivo Digital para la Docencia y la Investigación |
| collection |
Addi. Archivo Digital para la Docencia y la Investigación |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1869424040831090688 |
| score |
15.300724 |