The mRMR variable selection method: a comparative study for functional data

The use of variable selection methods is particularly appealing in statistical problems with functional data. The obvious general criterion for variable selection is to choose the ‘most representative’ or ‘most relevant’ variables. However, it is also clear that a purely relevance-oriented criterion...

Full description

Bibliographic Details
Authors: Berrendero Díaz, José Ramón, Cuevas González, Antonio, Torrecilla, J.L.
Format: article
Publication Date:2015
Country:España
Institution:Universidad Autónoma de Madrid
Repository:Biblos-e Archivo. Repositorio Institucional de la UAM
Language:English
OAI Identifier:oai:repositorio.uam.es:10486/674493
Online Access:http://hdl.handle.net/10486/674493
https://dx.doi.org/10.1080/00949655.2015.1042378
Access Level:Open access
Keyword:Distance correlation
Functional data analysis
Supervised classification
Variable selection
Matemáticas
Description
Summary:The use of variable selection methods is particularly appealing in statistical problems with functional data. The obvious general criterion for variable selection is to choose the ‘most representative’ or ‘most relevant’ variables. However, it is also clear that a purely relevance-oriented criterion could lead to select many redundant variables. The minimum Redundance Maximum Relevance (mRMR) procedure, proposed by Ding and Peng (2005) and Peng et al. (2005) is an algorithm to systematically perform variable selection, achieving a reasonable trade-off between relevance and redundancy. In its original form, this procedure is based on the use of the so-calledmutual information criterion to assess relevance and redundancy. Keeping the focus on functional data problems, we propose here a modified version of the mRMR method, obtained by replacing the mutual information by the new association measure (called distance correlation) suggested by Székely et al. (2007). We have also performed an extensive simulation study, including 1600 functional experiments (100 functional models x 4 sample sizes x 4 classifiers) and three real-data examples aimed at comparing the different versions of the mRMR methodology. The results are quite conclusive in favour of the new proposed alternative