Bilingual newsgroups in Catalonia: a challenge for machine translation

This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingu...

Descripción completa

Detalles Bibliográficos
Autores: Climent, Salvador, Moré López, Joaquim, Oliver, Antoni, Salvatierra Mallarach, Míriam, Sánchez Sáiz, Imma, Taulé Delor, Mariona, Vallmanya Cucurull, Lluïsa
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2003
País:España
Institución:Universitat Oberta de Catalunya (UOC)
Repositorio:O2, repositorio institucional de la UOC
OAI Identifier:oai:openaccess.uoc.edu:10609/108926
Acceso en línea:http://hdl.handle.net/10609/108926
Access Level:acceso abierto
Palabra clave:bilingualism
Catalan language
Spanish language
machine translation
català
bilingüisme
castellà
traducció automàtica
bilingüismo
catalán
castellano
traducción automática
Machine translating
Traducció automàtica
Traducción automática
Descripción
Sumario:This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism and contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an online machine translation environment. The results shed empirical light on the relevance of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machine translation for CMC data in order to facilitate cross-linguistic communication on the Internet.