Corpus PaGeS: A multifunctional resource for language learning, translation and cross-linguistic research

This chapter presents the bilingual parallel corpus PaGeS, compiled by the research group SpatiAlEs from the University of Santiago de Compostela. PaGeS currently amounts to nearly 20 million tokens and consists of texts originally written in German and in Spanish and their correspondent translation...

Descripción completa

Detalles Bibliográficos
Autores: Doval Reixa, Irene, Fernández Lanza, Santiago, Jiménez Juliá, Tomás, Liste Lamas, Elsa, Lübke, Barbara
Tipo de recurso: capítulo de libro
Fecha de publicación:2019
País:España
Institución:Universidad de Santiago de Compostela (USC)
Repositorio:Minerva. Repositorio Institucional de la Universidad de Santiago de Compostela
Idioma:inglés
OAI Identifier:oai:minerva.usc.gal:10347/39334
Acceso en línea:https://hdl.handle.net/10347/39334
Access Level:acceso abierto
Palabra clave:Parallel corpora
Corpus alignment
Corpus visualization
Spanish/German
5701 Lingüística aplicada
Descripción
Sumario:This chapter presents the bilingual parallel corpus PaGeS, compiled by the research group SpatiAlEs from the University of Santiago de Compostela. PaGeS currently amounts to nearly 20 million tokens and consists of texts originally written in German and in Spanish and their correspondent translations into the other language, as well as a small portion of German and Spanish translations from third languages. The present contribution introduces the main characteristics of the PaGeS corpus, focusing on its design and compilation. It first explains the criteria for the selection of the texts and the details of text pre-processing, automatic alignment and manual review. It then addresses the search and display features describing the server architecture and indexing process. Finally, the intended development of the PaGeS corpus is briefly discussed.