New insights into evaluation of regression models through a decomposition of the prediction errors: application to near-infrared spectral data

This paper analyzes the performance of linear regression models taking into account usual criteria such as the number of principal components or latent factors, the goodness of fit or the predictive capability. Other comparison criteria, more common in an economic context, are also considered: the d...

Descripción completa

Detalles Bibliográficos
Autores: Sánchez Rodríguez, María Isabel, Sánchez-López, Elena, Caridad, Jose María, Marinas, Alberto, Marinas, Jose Maria, Urbano, Francisco José
Tipo de recurso: artículo
Fecha de publicación:2013
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2099/13769
Acceso en línea:https://hdl.handle.net/2099/13769
Access Level:acceso abierto
Palabra clave:Mathematical statistics
partial least squares
principal components
multivariate calibration
NIR spectroscopy
Estadística matemàtica
Classificació AMS::62 Statistics::62H Multivariate analysis
Classificació AMS::62 Statistics::62J Linear inference, regression
Classificació AMS::62 Statistics::62Q05 Statistical tables
Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica
Descripción
Sumario:This paper analyzes the performance of linear regression models taking into account usual criteria such as the number of principal components or latent factors, the goodness of fit or the predictive capability. Other comparison criteria, more common in an economic context, are also considered: the degree of multicollinearity and a decomposition of the mean squared error of the prediction which determines the nature, systematic or random, of the prediction errors. The applications use real data of extra-virgin oil obtained by near-infrared spectroscopy. The high dimensionality of the data is reduced by applying principal component analysis and partial least squares analysis. A possible improvement of these methods by using cluster analysis or the information of the relative maxima of the spectrum is investigated. Finally, obtained results are generalized via cross-validation and bootstrapping.