Fast k most similar neighbor classifier for mixed data (tree k-MSN)

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and...

Descripción completa

Detalles Bibliográficos
Autores: SELENE HERNANDEZ RODRIGUEZ, JOSE FRANCISCO MARTINEZ TRINIDAD, JESUS ARIEL CARRASCO OCHOA
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2010
País:México
Institución:Instituto Nacional de Astrofísica, Óptica y Electrónica
Repositorio:Repositorio Institucional del INAOE
Idioma:inglés
OAI Identifier:oai:inaoe.repositorioinstitucional.mx:1009/1404
Acceso en línea:http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/1404
Access Level:acceso abierto
Palabra clave:info:eu-repo/classification/Nearest neighbor rule/Nearest neighbor rule
info:eu-repo/classification/Fast k nearest neighbor search/Fast k nearest neighbor search
info:eu-repo/classification/Mixed data/Mixed data
info:eu-repo/classification/Non-metric comparison functions/Non-metric comparison functions
info:eu-repo/classification/cti/1
info:eu-repo/classification/cti/12
info:eu-repo/classification/cti/1203
Descripción
Sumario:The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.