Bilingual newsgroups in Catalonia: a challenge for machine translation

This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingu...

ver descrição completa

Detalhes bibliográficos
Autores: Climent, Salvador, Moré López, Joaquim, Oliver, Antoni, Salvatierra Mallarach, Míriam, Sánchez Sáiz, Imma, Taulé Delor, Mariona, Vallmanya Cucurull, Lluïsa
Formato: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2003
País:España
Recursos:Universitat Oberta de Catalunya (UOC)
Repositorio:O2, repositorio institucional de la UOC
OAI Identifier:oai:openaccess.uoc.edu:10609/108926
Acesso em linha:http://hdl.handle.net/10609/108926
Access Level:acceso abierto
Palavra-chave:bilingualism
Catalan language
Spanish language
machine translation
català
bilingüisme
castellà
traducció automàtica
bilingüismo
catalán
castellano
traducción automática
Machine translating
Traducció automàtica
Traducción automática
Descrição
Resumo:This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism and contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an online machine translation environment. The results shed empirical light on the relevance of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machine translation for CMC data in order to facilitate cross-linguistic communication on the Internet.