Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article present...

Descripción completa

Detalles Bibliográficos
Autores: Ruiz Costa-Jussà, Marta|||0000-0002-5703-520X, Centelles, Jordi
Tipo de recurso: artículo
Fecha de publicación:2015
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/104736
Acceso en línea:https://hdl.handle.net/2117/104736
https://dx.doi.org/10.1145/2738045
Access Level:acceso abierto
Palabra clave:Machine translating
Statistics
Experimentation and languages
Rule-based machine translation
Statistical techniques
Chinese-to-Spanish
Traducció automàtica
Xinès -- Traducció automàtica
Estadística
Àrees temàtiques de la UPC::Ensenyament i aprenentatge
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació
Descripción
Sumario:Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.