An approach to the use of word embeddings in an opinion classification task
In this paper we show how a vector-based word representation obtained via word2vec can help to im- prove the results of a document classifier based on bags of words. Both models allow obtaining nu- meric representations from texts, but they do it very differently. The bag of words model can represen...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Estado: | Versión enviada para evaluación y publicación |
| Fecha de publicación: | 2016 |
| País: | España |
| Institución: | Universidad de Sevilla (US) |
| Repositorio: | idUS. Depósito de Investigación de la Universidad de Sevilla |
| OAI Identifier: | oai:idus.us.es:11441/99325 |
| Acceso en línea: | https://hdl.handle.net/11441/99325 https://doi.org/10.1016/j.eswa.2016.09.005 |
| Access Level: | acceso abierto |
| Palabra clave: | Document classification Opinion classification Word embedding Bag of words Word2vec |
| Sumario: | In this paper we show how a vector-based word representation obtained via word2vec can help to im- prove the results of a document classifier based on bags of words. Both models allow obtaining nu- meric representations from texts, but they do it very differently. The bag of words model can representdocuments by means of widely dispersed vectors in which the indices are words or groups of words.word2vec generates word level representations building vectors that are much more compact, where in- dices implicitly contain information about the context of word occurrences. Bags of words are very effec- tive for document classification and in our experiments no representation using only word2vec vectorsis able to improve their results. However, this does not mean that the information provided by word2vecis not useful for the classification task. When this information is used in combination with the bags ofwords, the results are improved, showing its complementarity and its contribution to the task. We havealso performed cross-domain experiments in which word2vec has shown much more stable behaviorthan bag of words models. |
|---|