Personality regression from multimodal dyadic data
Personality is made up of broad traits that are relatively stable over time and allow to differentiate one person from another. The most widely accepted theory to model personality is the Big-Five model that defines the traits as a spectrum, allowing to rank and measure differences between individua...
| Autor: | |
|---|---|
| Formato: | tesis de maestría |
| Fecha de publicación: | 2021 |
| País: | España |
| Recursos: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/352537 |
| Acesso em linha: | https://hdl.handle.net/2117/352537 |
| Access Level: | acceso abierto |
| Palavra-chave: | Machine learning Computer vision Artificial intelligence Reconeixement de personalitat Multimodal Trets de personalitat OCEAN Interacció humana Diàdic Aprenentatge profund Personality recognition Personality traits Human interaction Dyadic Deep learning Aprenentatge automàtic Visió per ordinador Intel·ligència artificial Àrees temàtiques de la UPC::Informàtica |
| Resumo: | Personality is made up of broad traits that are relatively stable over time and allow to differentiate one person from another. The most widely accepted theory to model personality is the Big-Five model that defines the traits as a spectrum, allowing to rank and measure differences between individual's personality. Humans infer personality by observing different verbal and non-verbal cues. We are able to infer the personality of others through the observation of different modalities, capturing patterns from speech, body gestures, facial expressions, among others. This Master's thesis proposes a multimodal model that extracts audiovisual features using state-of-the-art methods to infer the personality of a target person in a dyadic scenario. The model is trained on the UDIVA dataset , a multimodal dataset of non-scripted face-to-face dyadic interactions based on free and structured tasks that elicit different behavior and cognitive workload in the participants. All sessions are conducted in a controlled environment and the personality of the participants is obtained through self-reported assessments. We investigate the effect of the audio and video modalities for the recognition of the personality separately but also jointly, analyzing the general performance, by session, participant and by task. Furthermore, we also evaluate the effect of adding a larger range of visual and acoustic cues before producing the prediction regarding the performance of the model. The results from an incremental study show that the performance of the model is improved when combining long-range visual and acoustic features. Showing significant improvements in most metrics compared to the performance of the previous state-of-the-art model. The results are very promising considering that our model has been trained with a smaller part of the data set, fewer modalities and in a multi-task manner (a single model for all tasks). |
|---|