Master Dissertation : Information Retrieval for Question Answering based on Distributed Representations

Commonly used methods for information retrieval such as TFIDF do not capture the semantics of the query or the document. This is a problem, especially in cases where the words used in the queries are not contained in the documents. Therefore more research needs to be done to investigate how text sem...

Descripción completa

Detalles Bibliográficos
Autor: Sagrado Sala, Ana
Tipo de recurso: tesis de maestría
Fecha de publicación:2022
País:España
Institución:Universidad Nacional de Educación a Distancia
Repositorio:e-spacio. Repositorio Institucional de la UNED
Idioma:inglés
OAI Identifier:oai:e-spacio.uned.es:20.500.14468/14297
Acceso en línea:https://hdl.handle.net/20.500.14468/14297
Access Level:acceso abierto
Palabra clave:1203 Ciencia de los ordenadores
Descripción
Sumario:Commonly used methods for information retrieval such as TFIDF do not capture the semantics of the query or the document. This is a problem, especially in cases where the words used in the queries are not contained in the documents. Therefore more research needs to be done to investigate how text semantics can be applied to information retrieval, especially in cases where the corpus of documents is big and the queries and documents representations need to be compared fast and without the need of re-indexing. In this work, we conduct an exploratory study to investigate different embeddings and deep learning techniques and how this can be applied to the information retrieval task. We show that although existing methods based on word overlapping perform better in general, in particular cases where the word overlap between queries and documents is low, the use of semantic embedding outperforms other methods based on bag of words.