Fast anytime retrieval with confidence in large-scale temporal case bases

This work is about speeding up retrieval in Case-Based Reasoning (CBR) for large-scale case bases (CBs) comprised of temporally related cases in metric spaces. A typical example is a CB of electronic health records where consecutive sessions of a patient forms a sequence of related cases. k-Nearest...

Descripción completa

Detalles Bibliográficos
Autores: Mulayim, Mehmet Oguz, Arcos Rosell, Josep Lluís
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2020
País:España
Institución:Consejo Superior de Investigaciones Científicas (CSIC)
Repositorio:DIGITAL.CSIC. Repositorio Institucional del CSIC
OAI Identifier:oai:digital.csic.es:10261/234632
Acceso en línea:http://hdl.handle.net/10261/234632
Access Level:acceso abierto
Palabra clave:Large-scale case-based reasoning
Exact and approximate k-nearest neighbor search
Anytime algorithms
Descripción
Sumario:This work is about speeding up retrieval in Case-Based Reasoning (CBR) for large-scale case bases (CBs) comprised of temporally related cases in metric spaces. A typical example is a CB of electronic health records where consecutive sessions of a patient forms a sequence of related cases. k-Nearest Neighbors (kNN) search is a widely used algorithm in CBR retrieval. However, brute-force kNN is impossible for large CBs. As a contribution to efforts for speeding up kNN search, we introduce an anytime kNN search methodology and algorithm. Anytime Lazy kNN finds exact kNNs when allowed to run to completion with remarkable gain in execution time by avoiding unnecessary neighbor assessments. For applications where the gain in exact kNN search may not suffice, it can be interrupted earlier and it returns best-so-far kNNs together with a confidence value attached to each neighbor. We describe the algorithm and methodology to construct a probabilistic model that we use both to estimate confidence upon interruption and to automatize the interruption at desired confidence thresholds. We present the results of experiments conducted with publicly available datasets. The results show superior gains compared to brute-force search. We reach to an average gain of 87.18% with 0.98 confidence and to 96.84% with 0.70 confidence.