MSC+: Language pattern learning for word sense induction and disambiguation

Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. However, specially in the context of user-generated contents and on-line comm...

Descripción completa

Detalles Bibliográficos
Autores: Bif Goularte, Fábio, Sorato, Danielly, Modesto Nassar, Silvia, Fileto, Renato, Saggion, Horacio
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2019
País:España
Institución:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repositorio:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10230/46030
Acceso en línea:http://hdl.handle.net/10230/46030
http://dx.doi.org/10.1016/j.knosys.2019.105017
Access Level:acceso abierto
Palabra clave:Lexical semantics
Information extraction
Linguistic pattern mining
Word sense induction
Word sense disambiguation
Descripción
Sumario:Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. However, specially in the context of user-generated contents and on-line communication (e.g. Twitter), new meanings are continuously crafted by speakers as the result of existing words being used in novel contexts. Consequently, lexical semantics inventories and systems have difficulties to cope with semantic drifting problems. In this work, we propose an approach to induce and disambiguate word senses of some target words in collections of short texts, such as tweets, through the use of fuzzy lexico-semantic patterns that we define as sequences of Morpho-semantic Components (MSC+). We learn these patterns, that we call patterns, from text data automatically. Experimental results show that instances of some patterns arise in a number of tweets, but sometimes using different words to convey the sense of the respective MSC+ in some tweets where pattern instances appear. The exploitation of MSC+ patterns when they induce semantics on target words enable effective word sense disambiguation mechanisms leading to improvements in the state of the art.