Basque and Spanish Multilingual TTS Model for Speech-to-Speech Translation
[EN] Lately, multiple Text-to-Speech models have emerged using Deep Neural networks to synthesize audio from text. In this work, the state-of-the-art multilingual and multi-speaker Text-to-Speech model has been trained in Basque, Spanish, Catalan, and Galician. The research consisted of gathering th...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2023 |
| País: | España |
| Institución: | Universidad del País Vasco |
| Repositorio: | Addi. Archivo Digital para la Docencia y la Investigación |
| OAI Identifier: | oai:addi.ehu.eus:10810/61815 |
| Acceso en línea: | http://hdl.handle.net/10810/61815 |
| Access Level: | acceso abierto |
| Palabra clave: | multilingual multi-speaker text-to-speech speech-to-text machine translation speech-to-speech translation cross-lingual zero-shot voice conversion Basque Spanish |
| Sumario: | [EN] Lately, multiple Text-to-Speech models have emerged using Deep Neural networks to synthesize audio from text. In this work, the state-of-the-art multilingual and multi-speaker Text-to-Speech model has been trained in Basque, Spanish, Catalan, and Galician. The research consisted of gathering the datasets, pre-processing their audio and text data, training the model in the languages in different steps, and evaluating the results at each point. For the training step, a transfer learning approach has been used from a model already trained in three languages: English, Portuguese, and French. Therefore, the final model created here supports a total of seven languages. Moreover, these models also support zero-shot voice conversion, using an input audio file as a reference. Finally, a prototype application has been created to do Speech-to-Speech Translation, putting together the models trained here and other models from the community. Along the way, some Deep Speech Speech-to-Text models have been generated for Basque and Galician. |
|---|