Multimodal emotion recognition via face and voice

Recent advances in technology have allowed humans to interact with computers in ways previously unimaginable. Despite significant progress, a necessary element for natural interaction is still lacking: emotions. Emotions play an important role in human communication and interaction, allowing people...

Full description

Bibliographic Details
Author: Griera i Jiménez, Oriol
Format: master thesis
Publication Date:2022
Country:España
Institution:Universitat Politècnica de Catalunya (UPC)
Repository:UPCommons. Portal del coneixement obert de la UPC
Language:English
OAI Identifier:oai:upcommons.upc.edu:2117/374046
Online Access:https://hdl.handle.net/2117/374046
Access Level:Open access
Keyword:Computer vision
Deep learning
Computer Vision
Deep Learning
Emotion recognition
Visió per ordinador
Aprenentatge profund
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
Description
Summary:Recent advances in technology have allowed humans to interact with computers in ways previously unimaginable. Despite significant progress, a necessary element for natural interaction is still lacking: emotions. Emotions play an important role in human communication and interaction, allowing people to express themselves beyond the language domain. The purpose of this project is to develop a multimodal system to classify emotions using facial expressions and the voice taken from videos. For face emotion recognition, face images and optical flow frames are used to exploit spatial and temporal information of the videos. Regarding the voice, the model uses speech features extracted from the chunked audio signals to predict the emotion. The combination of the two biometrics with a score-level fusion achieves excellent performance on the RAVDESS and the BAUM-1 datasets. However, the results remark the importance of further investigating the preprocessing techniques applied in this work to "normalize" the datasets to a unified format to improve the cross-dataset performance.