Linguistic-family-specific Encoders and Decoders for Multilingual Spoken Machine Translation

This project provides a spoken language translation system trained with UN Parallel Corpus and MuST-C, aiming at study the correlation between languages of different linguistic families and the performance of the translation tasks. This SLT system consists of a text-to-text Neural Machine Translatio...

Descripción completa

Detalles Bibliográficos
Autor: Qin, Haoru
Tipo de recurso: tesis de maestría
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/383299
Acceso en línea:https://hdl.handle.net/2117/383299
Access Level:acceso abierto
Palabra clave:Natural language processing (Computer science)
Decoders (Electronics)
Machine translating
multilingual machine translation
spoken language translation
natural language processing
neural machine translation
Tractament del llenguatge natural (Informàtica)
Descodificadors (Electrònica)
Traducció automàtica
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
Descripción
Sumario:This project provides a spoken language translation system trained with UN Parallel Corpus and MuST-C, aiming at study the correlation between languages of different linguistic families and the performance of the translation tasks. This SLT system consists of a text-to-text Neural Machine Translation model, whose dataset includes six languages from five linguistic families, and a Automated Speech Recognition model, using dataset that contains four languages from four linguistic families. The combined SLT system is an end2end system, which is a relatively new task, and in this project, the idea is to analyze how would different linguistic families perform when training under the same conditions. Apart from measuring the performance using BLEU score system, this project also performs fine-tuning and zero-shot translation tasks. In general, the obtained BLEU scores are good and similar to original baseline models studies in UNPC and MuST-C papers. Finetuning and zero-shot translation experiments also obtained reasonable results, proving the hypothesized positive correlation between the closeness of languages and the performances of the translation tasks.