Parallel Corpora Spanish (PaCorES): A collection of multifunctional parallel corpora
The objective of this work is to provide researchers in the field of corpus linguistics with proper documentation on the PaCorES Project (www.pacores.eu). The PaCorES project was created with the aim of building a collection of bidirectional parallel bilingual corpora with Spanish as the central lan...
| Autores: | , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2026 |
| País: | España |
| Institución: | Universidad de Santiago de Compostela (USC) |
| Repositorio: | Minerva. Repositorio Institucional de la Universidad de Santiago de Compostela |
| Idioma: | inglés |
| OAI Identifier: | oai:minerva.usc.gal:10347/39335 |
| Acceso en línea: | https://hdl.handle.net/10347/39335 |
| Access Level: | acceso abierto |
| Palabra clave: | Parallel corpora Bidirectional corpora Corpus multifunctionality Corpus applications Corpus compilation Corpus alignment 5701 Lingüística aplicada |
| Sumario: | The objective of this work is to provide researchers in the field of corpus linguistics with proper documentation on the PaCorES Project (www.pacores.eu). The PaCorES project was created with the aim of building a collection of bidirectional parallel bilingual corpora with Spanish as the central language. The corpora currently included in the collection, in order of creation, are as follows: 1) The Parallel Corpus German<>Spanish, PaGeS, www.corpuspages.eu 2) The Parallel Corpus English<>Spanish, PaEnS, www.corpuspaens.eu 3) The Parallel Corpus Chinese<>Spanish, PaCheS, www.corpuspaches.eu 4) The Parallel Corpus French<>Spanish, PaFreS, www.corpuspafres.eu First, the authors identify the gaps and deficiencies in the landscape of bilingual and multilingual parallel corpora that include Spanish as one of the languages. Additionally, they highlight the inclusion of a particularly rare language pair, Chinese/Spanish, which has great potential due to the number of users. Next, they present the criteria that guided the design and architecture of the corpora to overcome these deficiencies. The paper emphasizes that the PaCorES corpora are fully accessible and stable, meaning they can be freely consulted online without restrictions. Stability is guaranteed, as the PaCorES corpora are published successively in clearly identified versions. Currently, the core PaCorES corpora include a collection of contemporary prose texts, mostly fiction. This type of text is underrepresented in parallel corpora due to the difficulty of obtaining them. They offer proven quality due to editorial control, and their translations have been carried out by professionals. The corpora are annotated with detailed metatextual information, documenting not only the complete source of the texts but also other data such as the translation direction, the degree of literalness, and the translator’s intervention. The next section is dedicated to the alignment process, the different software used, the F1 score achieved, and its manual review. The search architecture is explained, emphasizing the availability of three levels of search to accommodate different user needs, and detailing the functionalities of the interface and result presentation. Finally, the authors highlight that not only the individual components of PaCorES but also the project as a whole are designed with flexibility in mind. New language pairs can be added within the same collection architecture, and new texts can be incorporated into the individual components. The authors conclude that all these features make the PaCorES corpora a truly multifunctional resource that meets the needs of a wide variety of users. It serves specialists in linguistics in fields such as NLP (Natural Language Processing), lexicography, contrastive linguistics, translation studies, and language teaching and translation. Moreover, the ease of use of its search and visualization functions, along with the fast retrieval speed, allows the PaCorES collection to be used as an educational resource in language and translation teaching. In this context, intermediate to advanced students can discover numerous translation suggestions for a given term, presented directly through reliable usage examples. |
|---|