Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

[EN] This paper introduces the Group Linear Algorithm with Sparse Principal decomposition, an algorithm for supervised variable selection and clustering. Our approach extends the Sparse Group Lasso regularization to calculate clusters as part of the model fit. Therefore, unlike Sparse Group Lasso, o...

Descripción completa

Detalles Bibliográficos
Autores: Laria, Juan C., Lillo, Rosa E., Aguilera-Morillo, M. Carmen|||0000-0003-1027-9773
Tipo de recurso: artículo
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/197838
Acceso en línea:https://riunet.upv.es/handle/10251/197838
Access Level:acceso abierto
Palabra clave:Regression
Classification
Feature clustering
Statistical computing
ESTADISTICA E INVESTIGACION OPERATIVA
Descripción
Sumario:[EN] This paper introduces the Group Linear Algorithm with Sparse Principal decomposition, an algorithm for supervised variable selection and clustering. Our approach extends the Sparse Group Lasso regularization to calculate clusters as part of the model fit. Therefore, unlike Sparse Group Lasso, our idea does not require prior specification of clusters between variables. To determine the clusters, we solve a particular case of sparse Singular Value Decomposition, with a regularization term that follows naturally from the Group Lasso penalty. Moreover, this paper proposes a unified implementation to deal with, but not limited to, linear regression, logistic regression, and proportional hazards models with right-censoring. Our methodology is evaluated using both biological and simulated data, and details of the implementation in R and hyperparameter search are discussed.