Building a dataset of emotions with distant supervision

Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel

Detalhes bibliográficos
Autor: Schaefer Trindade, Luísa
Formato: tesis de maestría
Fecha de publicación:2024
País:España
Recursos:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repositorio:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10230/69824
Acesso em linha:http://hdl.handle.net/10230/69824
Access Level:acceso abierto
Palavra-chave:Distant supervision
Emotion lexicon
Annotated dataset
Portuguese
id ES_7299baec5ce856426a9cf437aa048ab9
oai_identifier_str oai:recercat.cat:10230/69824
network_acronym_str ES
network_name_str España
repository_id_str
spelling Building a dataset of emotions with distant supervisionSchaefer Trindade, LuísaDistant supervisionEmotion lexiconAnnotated datasetPortugueseTreball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria BelIn Natural Language Processing (NLP), emotion detection is a challenging problem of text classification. Using supervised machine learning to tackle this task requires annotated datasets, which can be difficult to come by because they are costly to produce. Moreover, emotions are subjective, and human annotators often disagree in their assessments. Recently, many methods have been proposed to reduce costs, including distant supervision. This thesis presents a strategy for annotating emotions in literary works in Brazilian Portuguese. Using a combination of regular expressions for automatic dialogue extraction, SpaCy, and a lexicon containing 26 emotions, we classify dialogue by considering words used by the narrator to introduce and describe it. The results are mixed, given the large set of emotion labels, many of which are underrepresented in our data collection efforts. However, this strategy can still benefit the annotation of literary corpora with more common emotions such as Happiness and Dissatisfaction.202520252024info:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10230/69824reponame:Recercat. Dipósit de la Recerca de Catalunyainstname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)InglésLlicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacionalhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.cainfo:eu-repo/semantics/openAccessoai:recercat.cat:10230/698242026-05-29T05:05:01Z
dc.title.none.fl_str_mv Building a dataset of emotions with distant supervision
title Building a dataset of emotions with distant supervision
spellingShingle Building a dataset of emotions with distant supervision
Schaefer Trindade, Luísa
Distant supervision
Emotion lexicon
Annotated dataset
Portuguese
title_short Building a dataset of emotions with distant supervision
title_full Building a dataset of emotions with distant supervision
title_fullStr Building a dataset of emotions with distant supervision
title_full_unstemmed Building a dataset of emotions with distant supervision
title_sort Building a dataset of emotions with distant supervision
dc.creator.none.fl_str_mv Schaefer Trindade, Luísa
author Schaefer Trindade, Luísa
author_facet Schaefer Trindade, Luísa
author_role author
dc.subject.none.fl_str_mv Distant supervision
Emotion lexicon
Annotated dataset
Portuguese
topic Distant supervision
Emotion lexicon
Annotated dataset
Portuguese
description Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel
publishDate 2024
dc.date.none.fl_str_mv 2024
2025
2025
dc.type.none.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
dc.identifier.none.fl_str_mv http://hdl.handle.net/10230/69824
url http://hdl.handle.net/10230/69824
dc.language.none.fl_str_mv Inglés
language_invalid_str_mv Inglés
dc.rights.none.fl_str_mv Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Recercat. Dipósit de la Recerca de Catalunya
instname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
instname_str Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
reponame_str Recercat. Dipósit de la Recerca de Catalunya
collection Recercat. Dipósit de la Recerca de Catalunya
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869410747570716672
score 15,81155