Speaker diarization and speech recognition in the semi-automatization of audio description

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision....

ver descrição completa

Detalhes bibliográficos
Autores: Delgado Flores, Héctor, Matamala, Anna|||0000-0002-1607-9011, Serrano, Javier|||0000-0003-1235-2145
Tipo de documento: artigo
Data de publicação:2015
País:España
Recursos:Universitat Autònoma de Barcelona
Repositório:Dipòsit Digital de Documents de la UAB
Idioma:inglês
OAI Identifier:oai:ddd.uab.cat:144880
Acesso em linha:https://ddd.uab.cat/record/144880
https://dx.doi.org/urn:doi:10.5007/2175-7968.2015v35n2p308
Access Level:Acceso aberto
Palavra-chave:Audio description
Accessibility
Speaker diarization
Speech recognition
Technology
Audiodescripción
Accesibilidad
Diarización
Reconocimiento de habla
Tecnología
Descrição
Resumo:This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.