Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most o...

Descripción completa

Detalles Bibliográficos
Autor: Selene Hernández Rodríguez
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2010
País:México
Institución:Instituto Nacional de Astrofísica, Óptica y Electrónica
Repositorio:Redalyc-INAOE
OAI Identifier:oai:redalyc.org:61519184008
Acceso en línea:https://www.redalyc.org/articulo.oa?id=61519184008
Access Level:acceso abierto
Palabra clave:Computación
non
mixed data
Nearest neighbor rule
metric comparison functions
fast nearest neighbor search
Descripción
Sumario:The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison function is a metric (commonly for numerical data). However, in some sciences such as Medicine, Geology, Sociology, etc., the prototypes are usually described by qualitative and quantitative features (mixed data). In these cases, the comparison function does not necessarily satisfy metric properties. For this reason, it is important to develop fast k most similar neighbor (k-MSN) classifiers for mixed data, which use non metric comparisons functions. In this thesis, four fast k-MSN classifiers, following the most successful approaches, are proposed. The experiments over different datasets show that the proposed classifiers significantly reduce the number of prototype comparisons.