A large Spanish-Catalan parallel corpus release for machine translation
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when buildi...
| Autores: | , , , , |
|---|---|
| Tipo de recurso: | artículo |
| Estado: | Versión publicada |
| Fecha de publicación: | 2014 |
| País: | España |
| Institución: | Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya) |
| Repositorio: | Recercat. Dipósit de la Recerca de Catalunya |
| OAI Identifier: | oai:recercat.cat:10230/26266 |
| Acceso en línea: | http://hdl.handle.net/10230/26266 |
| Access Level: | acceso abierto |
| Palabra clave: | Catalan-Spanish parallel corpus Machine translation |
| Sumario: | We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053. |
|---|