Deep regression of social signals in Dyadic Scenarios

The purpose of this project is to design a general system for emotion recognition through social signals in dyadic using deep learning methods using raw data from audio, video and text transcriptions from publicly available database records. The automatic emotion recognition problem has increased th...

Descripción completa

Detalles Bibliográficos
Autor: Vidal Lucero, Ítalo
Tipo de recurso: tesis de maestría
Fecha de publicación:2020
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/336189
Acceso en línea:https://hdl.handle.net/2117/336189
Access Level:acceso abierto
Palabra clave:Neural networks (Computer science)
Machine learning
emotion recognition
recurrent neural networks
feature extraction
multi-modal database
dyadic scenario
Xarxes neuronals (Informàtica)
Aprenentatge automàtic
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
Descripción
Sumario:The purpose of this project is to design a general system for emotion recognition through social signals in dyadic using deep learning methods using raw data from audio, video and text transcriptions from publicly available database records. The automatic emotion recognition problem has increased the attention in the scientific community considering the multi applications for emotion detection but also to design more accurate and complex empathic machines. During this project are proposed alternatives for utterance representation of multi-modal data generated from text, audio and video, in order to improve the state of the art system for emotion recognition based on deep learning networks. The proposed framework is based in IEMOCAP database but it has a general scope for any multi-modal database. The performance of this system outperforms the state of the art method and delivers an informative analysis concerning the utterance representation quality. Finally, the conclusions of this work are exposed along with potential future lines of work related to emotion recognition systems and emotion representations.