Basque and Spanish Multilingual TTS Model for Speech-to-Speech Translation

[EN] Lately, multiple Text-to-Speech models have emerged using Deep Neural networks to synthesize audio from text. In this work, the state-of-the-art multilingual and multi-speaker Text-to-Speech model has been trained in Basque, Spanish, Catalan, and Galician. The research consisted of gathering th...

Descripción completa

Detalles Bibliográficos
Autor: De Zuazo Oteiza, Xabier
Tipo de recurso: tesis de maestría
Fecha de publicación:2023
País:España
Institución:Universidad del País Vasco
Repositorio:Addi. Archivo Digital para la Docencia y la Investigación
OAI Identifier:oai:addi.ehu.eus:10810/61815
Acceso en línea:http://hdl.handle.net/10810/61815
Access Level:acceso abierto
Palabra clave:multilingual multi-speaker text-to-speech
speech-to-text
machine translation
speech-to-speech translation
cross-lingual zero-shot voice conversion
Basque
Spanish
Descripción
Sumario:[EN] Lately, multiple Text-to-Speech models have emerged using Deep Neural networks to synthesize audio from text. In this work, the state-of-the-art multilingual and multi-speaker Text-to-Speech model has been trained in Basque, Spanish, Catalan, and Galician. The research consisted of gathering the datasets, pre-processing their audio and text data, training the model in the languages in different steps, and evaluating the results at each point. For the training step, a transfer learning approach has been used from a model already trained in three languages: English, Portuguese, and French. Therefore, the final model created here supports a total of seven languages. Moreover, these models also support zero-shot voice conversion, using an input audio file as a reference. Finally, a prototype application has been created to do Speech-to-Speech Translation, putting together the models trained here and other models from the community. Along the way, some Deep Speech Speech-to-Text models have been generated for Basque and Galician.