Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation

This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality para...

Descripción completa

Detalles Bibliográficos
Autores: Farrés, Mireia, Platikanov, Stefan, Tsakovski, Stefan L., Tauler, Romà
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2015
País:España
Institución:Consejo Superior de Investigaciones Científicas (CSIC)
Repositorio:DIGITAL.CSIC. Repositorio Institucional del CSIC
OAI Identifier:oai:digital.csic.es:10261/128171
Acceso en línea:http://hdl.handle.net/10261/128171
Access Level:acceso abierto
Palabra clave:Partial least squares
Selectivity ratio
Variable importance in projection
Variable selection
Descripción
Sumario:This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography-mass spectrometry (GC-MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p-value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC-MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes.