Annotation of regular polysemy: an empirical assessment of the underspecified sense
Words that belong to a semantic type, like location, can metonymically behave as a member of another semantic type, like organization. This phenomenon is known as regular polysemy. In Pustejovsky's (1995) Generative Lexicon, some cases of regular polysemy are grouped in a complex semantic class...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis doctoral |
| Estado: | Versión publicada |
| Fecha de publicación: | 2013 |
| País: | España |
| Institución: | CBUC, CESCA |
| Repositorio: | TDR. Tesis Doctorales en Red |
| OAI Identifier: | oai:www.tdx.cat:10803/145324 |
| Acceso en línea: | http://hdl.handle.net/10803/145324 |
| Access Level: | acceso abierto |
| Palabra clave: | Polisèmia Tractament automàtic de la parla 81 |
| Sumario: | Words that belong to a semantic type, like location, can metonymically behave as a member of another semantic type, like organization. This phenomenon is known as regular polysemy. In Pustejovsky's (1995) Generative Lexicon, some cases of regular polysemy are grouped in a complex semantic class called a dot type. For instance, the sense alternation mentioned above is the location organization dot type. Other dot types are for instance animal meat or container content. We refer to the usages of dot-type words that are potentially both metonymic and literal as underspeci ed. Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspeci ed examples at the token level, namely when annotating or disambiguating senses of dot types. This leads us to the main research question of the dissertation: Does sense underspeci cation justify incorporating a third sense into our sense inventories when dealing with dot types at the token level, thereby treating the underspeci ed sense as independent from the literal and metonymic? We have conducted an analysis in English, Danish and Spanish on the possibility to annotate underspeci ed senses by humans. If humans cannot consistently annotate the underspeci ed sense, its applicability to NLP tasks is to be called into question. Later on, we have tried to replicate the human judgments by means of unsupervised and semisupervised sense prediction. Achieving an NLP method that can reproduce the human judgments for the underspeci ed sense would be suf- cient to postulate the inclusion of the underspeci ed in our sense inventories. The human annotation task has yielded results that indicate that the kind of annotator (volunteer vs. crowdsourced from Amazon Mechanical Turk) is a decisive factor in the recognizability of the underspeci ed sense. This sense distinction is too nuanced to be recognized using crowdsourced annotations. The automatic sense-prediction systems have been unable to nd empiric evidence for the underspeci ed sense, even though the semisupervised system recognizes the literal and metonymic senses with good performance. In this light, we propose an alternative representation for the sense alternation of dot-type words where literal and metonymic are poles in a continuum, instead of discrete categories. |
|---|