Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content

Batet, Montserrat; Sanchez, David

Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content

Semantic similarity measures based on the estimation of the information content (IC) of concepts are currently regarded as the state of the art. Calculating the IC in an intrinsic (i.e., ontology-based) way is particularly convenient due to its accuracy and lack of dependency on annotated corpora. I...

Descripción completa

Detalles Bibliográficos
Autores:	Batet, Montserrat, Sanchez, David
Tipo de recurso:	artículo
Estado:	Versión publicada
Fecha de publicación:	2020
País:	España
Institución:	Universitat Oberta de Catalunya (UOC)
Repositorio:	O2, repositorio institucional de la UOC
OAI Identifier:	oai:openaccess.uoc.edu:10609/152519
Acceso en línea:	http://hdl.handle.net/10609/152519 https://doi.org/10.1007/s10462-019-09725-4
Access Level:	acceso abierto
Palabra clave:	information content ontology-based semantic similarity synonymy polysemy WordNet

Descripción
Sumario:	Semantic similarity measures based on the estimation of the information content (IC) of concepts are currently regarded as the state of the art. Calculating the IC in an intrinsic (i.e., ontology-based) way is particularly convenient due to its accuracy and lack of dependency on annotated corpora. Intrinsic IC calculation models estimate concept probabilities from the taxonomic knowledge (i.e., number of hyponyms and/or hypernyms of the concepts) modelled in an ontology. In this paper, we aim to improve the intrinsic calculation of the IC by leveraging not only the hyponyms and hypernyms of concepts, but also the explicit evidences of synonymy and polysemy that ontologies such as WordNet also model. Specifically, we propose a more accurate intrinsic estimation of the concepts’ probabilities in which the IC calculation relies. We evaluate the accuracy of our proposal through a set of comprehensive experiments in which our IC calculation model is tested on a variety of IC-based similarity measures and benchmarks. Experimental results show that our proposal obtains consistently good accuracies, which vary less across measures and benchmarks than the most prominent intrinsic IC calculation models available in the literature.

Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content

Similares en LA Referencia