Comparison of CCA and PLS to explore and model NIR data

Partial Least Squares (PLS) regression is the most widely used technique for developing NIR calibrations. PLS uses several factors to reach the optimum models which can be helpful in a physical interpretation of the sources of correlation between x and y variables. However, it suffers from later fac...

Descripción completa

Detalles Bibliográficos
Autores: Gatius Cortiella, Ferran, Miralbés, Carlos, David, Calin, Puy Llorens, Jaume
Tipo de recurso: artículo
Estado:Versión aceptada para publicación
Fecha de publicación:2017
País:España
Institución:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repositorio:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10459.1/467349
Acceso en línea:https://doi.org/10.1016/j.chemolab.2017.03.011
https://hdl.handle.net/10459.1/467349
Access Level:acceso abierto
Palabra clave:Canonical Correlation Analysis
Partial Least Squares
NIR calibration
Regularization
Descripción
Sumario:Partial Least Squares (PLS) regression is the most widely used technique for developing NIR calibrations. PLS uses several factors to reach the optimum models which can be helpful in a physical interpretation of the sources of correlation between x and y variables. However, it suffers from later factors not arising in the order of the explained variance. Canonical Correlation Analysis (CCA) overcomes this problem by selecting the latent variables as the directions of maximum x-y correlation. Calibration of moisture, crude protein, dry gluten and resistance of dough to deformation of wheat flour samples from NIR spectra is here studied using PLS-1, PLS-2, CCA-1 and CCA-2. The calibration set contains 429 samples while 215 extra independent samples are used for the validation set. It is shown that a 2-D CCA-2 calibration model gathers the highest explained variance between the models studied. When particular calibration models of each property are compared, CCA requires regularization to avoid instability of the regression coefficients. A regularization term that tends to reduce the regression coefficients and the Durbin-Watson test or the Test for Runs to select the regularization parameter have been used. Both statistical tests led to similar values of the regularization parameter and the resulting regression coefficients and RMSEP of the CCA-1 models became similar to those of the PLS-1 models.