Reproducibility dataset for a benchmark of biomedical semantic measures libraries

This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our companion paper, which compare the performance of the three UMLS-based semantic similarity libraries reported in the literature as follows: (1) UMLS::Similar...

ver descrição completa

Detalhes bibliográficos
Autores: Lastra-Díaz, Juan J., Lara-Clares, Alicia, Garcia-Serrano, Ana M.
Formato: conjunto de datos
Estado:Versión publicada
Fecha de publicación:2020
País:España
Recursos:Consorcio Madroño
Repositorio:e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
OAI Identifier:doi:10.21950/OTDA4Z
Acesso em linha:https://doi.org/10.21950/OTDA4Z
Access Level:acceso abierto
Palavra-chave:Engineering
HESML
Docker
semantic measures library
Ontology-based semantic similarity measures
Information Content (IC) models
UMLS
SNOMED-CT US
MeSH
Gene Ontology
Descrição
Resumo:This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our companion paper, which compare the performance of the three UMLS-based semantic similarity libraries reported in the literature as follows: (1) UMLS::Similarity [20], (2) Semantic Measures Library (SML) [3], and the latest version of our Half-Edge Semantic Measures Library (HESML) introduced in our aforementioned companion paper. HESML V1R5 is the fifth release of our Half-Edge Semantic Measures Library (HESML) detailed in [15] which is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontologies like WordNet, SNOMED-CT, MeSH and GO. This dataset sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments in any software platform supported by Docker, such as all Linux-based operating systems, Windows or MacOS. Our benchmark program is distributed with the UMLS SNOMED-CT and MeSH ontologies by courtesy of the US National Library of Medicine (NLM), as well as all needed software components with the aim of making the setup process easier. Our Docker image provides an exact virtual replica of the machine in which we ran our experiments, thus removing the need to carry-out any tedious setup process, such as the setup of the UMLS Metathesaurus on MySQL database, UMLS::Similarity library and other software components.