Building a dataset of emotions with distant supervision
Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel
| Autor: | |
|---|---|
| Formato: | tesis de maestría |
| Fecha de publicación: | 2024 |
| País: | España |
| Recursos: | Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya) |
| Repositorio: | Recercat. Dipósit de la Recerca de Catalunya |
| OAI Identifier: | oai:recercat.cat:10230/69824 |
| Acesso em linha: | http://hdl.handle.net/10230/69824 |
| Access Level: | acceso abierto |
| Palavra-chave: | Distant supervision Emotion lexicon Annotated dataset Portuguese |
| id |
ES_7299baec5ce856426a9cf437aa048ab9 |
|---|---|
| oai_identifier_str |
oai:recercat.cat:10230/69824 |
| network_acronym_str |
ES |
| network_name_str |
España |
| repository_id_str |
|
| spelling |
Building a dataset of emotions with distant supervisionSchaefer Trindade, LuísaDistant supervisionEmotion lexiconAnnotated datasetPortugueseTreball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria BelIn Natural Language Processing (NLP), emotion detection is a challenging problem of text classification. Using supervised machine learning to tackle this task requires annotated datasets, which can be difficult to come by because they are costly to produce. Moreover, emotions are subjective, and human annotators often disagree in their assessments. Recently, many methods have been proposed to reduce costs, including distant supervision. This thesis presents a strategy for annotating emotions in literary works in Brazilian Portuguese. Using a combination of regular expressions for automatic dialogue extraction, SpaCy, and a lexicon containing 26 emotions, we classify dialogue by considering words used by the narrator to introduce and describe it. The results are mixed, given the large set of emotion labels, many of which are underrepresented in our data collection efforts. However, this strategy can still benefit the annotation of literary corpora with more common emotions such as Happiness and Dissatisfaction.202520252024info:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10230/69824reponame:Recercat. Dipósit de la Recerca de Catalunyainstname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)InglésLlicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacionalhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.cainfo:eu-repo/semantics/openAccessoai:recercat.cat:10230/698242026-05-29T05:05:01Z |
| dc.title.none.fl_str_mv |
Building a dataset of emotions with distant supervision |
| title |
Building a dataset of emotions with distant supervision |
| spellingShingle |
Building a dataset of emotions with distant supervision Schaefer Trindade, Luísa Distant supervision Emotion lexicon Annotated dataset Portuguese |
| title_short |
Building a dataset of emotions with distant supervision |
| title_full |
Building a dataset of emotions with distant supervision |
| title_fullStr |
Building a dataset of emotions with distant supervision |
| title_full_unstemmed |
Building a dataset of emotions with distant supervision |
| title_sort |
Building a dataset of emotions with distant supervision |
| dc.creator.none.fl_str_mv |
Schaefer Trindade, Luísa |
| author |
Schaefer Trindade, Luísa |
| author_facet |
Schaefer Trindade, Luísa |
| author_role |
author |
| dc.subject.none.fl_str_mv |
Distant supervision Emotion lexicon Annotated dataset Portuguese |
| topic |
Distant supervision Emotion lexicon Annotated dataset Portuguese |
| description |
Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024 2025 2025 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/10230/69824 |
| url |
http://hdl.handle.net/10230/69824 |
| dc.language.none.fl_str_mv |
Inglés |
| language_invalid_str_mv |
Inglés |
| dc.rights.none.fl_str_mv |
Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Recercat. Dipósit de la Recerca de Catalunya instname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya) |
| instname_str |
Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya) |
| reponame_str |
Recercat. Dipósit de la Recerca de Catalunya |
| collection |
Recercat. Dipósit de la Recerca de Catalunya |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1869410747570716672 |
| score |
15,81155 |