Personality regression from multimodal dyadic data

Personality is made up of broad traits that are relatively stable over time and allow to differentiate one person from another. The most widely accepted theory to model personality is the Big-Five model that defines the traits as a spectrum, allowing to rank and measure differences between individua...

ver descrição completa

Detalhes bibliográficos
Autor: Curto Janó, David
Formato: tesis de maestría
Fecha de publicación:2021
País:España
Recursos:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/352537
Acesso em linha:https://hdl.handle.net/2117/352537
Access Level:acceso abierto
Palavra-chave:Machine learning
Computer vision
Artificial intelligence
Reconeixement de personalitat
Multimodal
Trets de personalitat
OCEAN
Interacció humana
Diàdic
Aprenentatge profund
Personality recognition
Personality traits
Human interaction
Dyadic
Deep learning
Aprenentatge automàtic
Visió per ordinador
Intel·ligència artificial
Àrees temàtiques de la UPC::Informàtica
Descrição
Resumo:Personality is made up of broad traits that are relatively stable over time and allow to differentiate one person from another. The most widely accepted theory to model personality is the Big-Five model that defines the traits as a spectrum, allowing to rank and measure differences between individual's personality. Humans infer personality by observing different verbal and non-verbal cues. We are able to infer the personality of others through the observation of different modalities, capturing patterns from speech, body gestures, facial expressions, among others. This Master's thesis proposes a multimodal model that extracts audiovisual features using state-of-the-art methods to infer the personality of a target person in a dyadic scenario. The model is trained on the UDIVA dataset , a multimodal dataset of non-scripted face-to-face dyadic interactions based on free and structured tasks that elicit different behavior and cognitive workload in the participants. All sessions are conducted in a controlled environment and the personality of the participants is obtained through self-reported assessments. We investigate the effect of the audio and video modalities for the recognition of the personality separately but also jointly, analyzing the general performance, by session, participant and by task. Furthermore, we also evaluate the effect of adding a larger range of visual and acoustic cues before producing the prediction regarding the performance of the model. The results from an incremental study show that the performance of the model is improved when combining long-range visual and acoustic features. Showing significant improvements in most metrics compared to the performance of the previous state-of-the-art model. The results are very promising considering that our model has been trained with a smaller part of the data set, fewer modalities and in a multi-task manner (a single model for all tasks).