Similarity of samples and trimming

We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in t...

Descripción completa

Detalles Bibliográficos
Autores: Álvarez Esteban, Pedro César|||0000-0002-8818-0194, Barrio Tellado, Eustasio del, Cuesta Albertos, Juan Antonio|||0000-0001-8228-5924, Matran Bea, Carlos
Tipo de recurso: artículo
Fecha de publicación:2012
País:España
Institución:Universidad de Cantabria (UC)
Repositorio:UCrea Repositorio Abierto de la Universidad de Cantabria
Idioma:inglés
OAI Identifier:oai:repositorio.unican.es:10902/29685
Acceso en línea:https://hdl.handle.net/10902/29685
Access Level:acceso abierto
Palabra clave:Asymptotics
Bootstrap
Consistency
Mass transportation problem
Over-fitting
Robustness
Similarity of distributions
Trimmed probability
Wasserstein distance
Descripción
Sumario:We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.