Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation

With recent advances in largue language models, the evolution of speech-to-text tasks has been exponential. While state-of-the-art automatic speech recognition (ASR) models have taken a big step in speech transcription, creating quality subtitles still requires human intervention. This project has t...

Descripción completa

Detalles Bibliográficos
Autor: Fresneda García, Julio
Tipo de recurso: tesis de maestría
Fecha de publicación:2024
País:España
Institución:Universidad Nacional de Educación a Distancia
Repositorio:e-spacio. Repositorio Institucional de la UNED
Idioma:inglés
OAI Identifier:oai:e-spacio.uned.es:20.500.14468/22596
Acceso en línea:https://hdl.handle.net/20.500.14468/22596
Access Level:acceso abierto
Palabra clave:1203.04 Inteligencia artificial
ASR
LLM
Speech-To-Text
Subtitle
id ES_d8a7b5dd2b1cb753b8ebb5bbe94ece93
oai_identifier_str oai:e-spacio.uned.es:20.500.14468/22596
network_acronym_str ES
network_name_str España
repository_id_str
spelling Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle GenerationFresneda García, Julio1203.04 Inteligencia artificialASRLLMSpeech-To-TextSubtitleWith recent advances in largue language models, the evolution of speech-to-text tasks has been exponential. While state-of-the-art automatic speech recognition (ASR) models have taken a big step in speech transcription, creating quality subtitles still requires human intervention. This project has two main aspects: evaluating cutting-edge ASR models for speech-to-text, and developing a package that uses this ASR models to generate high-quality and compliant subtitles. ASR models do not inherently provide results suitable for subtitles. Therefore, one of the primary objectives of this package is to utilize and enhance the output generated by ASR models to create subtitles of a quality that requires minimal human modification. This enhancement is necessary because ASR models alone are incapable of producing subtitles that meet the required standards of quality. Speak2Subs has achieved this goal, being a tool that produces high-quality subtitles with minimal human interaction.Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática.e-Spacio UNED20242024-06-1120242024-02-0120242024-02-01master thesishttp://purl.org/coar/resource_type/c_bdccinfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/20.500.14468/22596reponame:e-spacio. Repositorio Institucional de la UNEDinstname:Universidad Nacional de Educación a DistanciaInglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.esoai:e-spacio.uned.es:20.500.14468/225962026-06-06T12:38:31Z
dc.title.none.fl_str_mv Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
title Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
spellingShingle Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
Fresneda García, Julio
1203.04 Inteligencia artificial
ASR
LLM
Speech-To-Text
Subtitle
title_short Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
title_full Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
title_fullStr Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
title_full_unstemmed Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
title_sort Speak2Subs: Evaluating State-of-the-Art Speech Recognition Models and Compliant Subtitle Generation
dc.creator.none.fl_str_mv Fresneda García, Julio
author Fresneda García, Julio
author_facet Fresneda García, Julio
author_role author
dc.contributor.none.fl_str_mv e-Spacio UNED
dc.subject.none.fl_str_mv 1203.04 Inteligencia artificial
ASR
LLM
Speech-To-Text
Subtitle
topic 1203.04 Inteligencia artificial
ASR
LLM
Speech-To-Text
Subtitle
description With recent advances in largue language models, the evolution of speech-to-text tasks has been exponential. While state-of-the-art automatic speech recognition (ASR) models have taken a big step in speech transcription, creating quality subtitles still requires human intervention. This project has two main aspects: evaluating cutting-edge ASR models for speech-to-text, and developing a package that uses this ASR models to generate high-quality and compliant subtitles. ASR models do not inherently provide results suitable for subtitles. Therefore, one of the primary objectives of this package is to utilize and enhance the output generated by ASR models to create subtitles of a quality that requires minimal human modification. This enhancement is necessary because ASR models alone are incapable of producing subtitles that meet the required standards of quality. Speak2Subs has achieved this goal, being a tool that produces high-quality subtitles with minimal human interaction.
publishDate 2024
dc.date.none.fl_str_mv 2024
2024-06-11
2024
2024-02-01
2024
2024-02-01
dc.type.none.fl_str_mv master thesis
http://purl.org/coar/resource_type/c_bdcc
dc.type.openaire.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
dc.identifier.none.fl_str_mv https://hdl.handle.net/20.500.14468/22596
url https://hdl.handle.net/20.500.14468/22596
dc.language.none.fl_str_mv Inglés
eng
language_invalid_str_mv Inglés
language eng
dc.rights.none.fl_str_mv open access
http://purl.org/coar/access_right/c_abf2
info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
rights_invalid_str_mv open access
http://purl.org/coar/access_right/c_abf2
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática.
publisher.none.fl_str_mv Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática.
dc.source.none.fl_str_mv reponame:e-spacio. Repositorio Institucional de la UNED
instname:Universidad Nacional de Educación a Distancia
instname_str Universidad Nacional de Educación a Distancia
reponame_str e-spacio. Repositorio Institucional de la UNED
collection e-spacio. Repositorio Institucional de la UNED
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869421193108389888
score 15,81155