Double-Weighting for Covariate Shift Adaptation

Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $p_\text{tr}(x)$ and $p_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by ei...

Descripción completa

Detalles Bibliográficos
Autores: Segovia, J.I, Mazuelas, S., Liu, A.
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2023
País:España
Institución:Basque Center for Applied Mathematics (BCAM)
Repositorio:BIRD. BCAM's Institutional Repository Data
OAI Identifier:oai:bird.bcamath.org:20.500.11824/1765
Acceso en línea:http://hdl.handle.net/20.500.11824/1765
Access Level:acceso abierto
Palabra clave:Covariate Shift, Supervised Classification, Selection Bias, Minimax Classification
Descripción
Sumario:Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $p_\text{tr}(x)$ and $p_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $p_\text{te}(x)/p_\text{tr}(x)$ to weight training samples (reweighted methods) or using the ratio $p_\text{tr}(x)/p_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.