Similarity of samples and trimming

We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in t...

Descripción completa

Detalles Bibliográficos
Autores: Álvarez Esteban, Pedro César|||0000-0002-8818-0194, Barrio Tellado, Eustasio del, Cuesta Albertos, Juan Antonio|||0000-0001-8228-5924, Matran Bea, Carlos
Tipo de recurso: artículo
Fecha de publicación:2012
País:España
Institución:Universidad de Cantabria (UC)
Repositorio:UCrea Repositorio Abierto de la Universidad de Cantabria
Idioma:inglés
OAI Identifier:oai:repositorio.unican.es:10902/29685
Acceso en línea:https://hdl.handle.net/10902/29685
Access Level:acceso abierto
Palabra clave:Asymptotics
Bootstrap
Consistency
Mass transportation problem
Over-fitting
Robustness
Similarity of distributions
Trimmed probability
Wasserstein distance
id ES_7f6f26c8d277609ab80baffc8c95ece4
oai_identifier_str oai:repositorio.unican.es:10902/29685
network_acronym_str ES
network_name_str España
repository_id_str
spelling Similarity of samples and trimmingÁlvarez Esteban, Pedro César|||0000-0002-8818-0194Barrio Tellado, Eustasio delCuesta Albertos, Juan Antonio|||0000-0001-8228-5924Matran Bea, CarlosAsymptoticsBootstrapConsistencyMass transportation problemOver-fittingRobustnessSimilarity of distributionsTrimmed probabilityWasserstein distanceWe say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.Research partially supported by the Spanish Ministerio de Ciencia e Innovación, Grant MTM2008-06067-C02-01, and 02 and by the Consejería de Educación y Cultura de la Junta de Castilla y León, GR150. The authors would like to thank two anonymous referees for their careful reading of the manuscript, their suggestions and the pointers to relevant references that helped us to greatly improve our original version.International Statistical Institute; Chapman and HallUniversidad de Cantabria20122012-05-01journal articlehttp://purl.org/coar/resource_type/c_6501NAhttp://purl.org/coar/version/c_be7fb7dd8ff6fe43info:eu-repo/semantics/articlehttps://hdl.handle.net/10902/29685Bernoulli, 2012, 18(2), 606-634reponame:UCrea Repositorio Abierto de la Universidad de Cantabriainstname:Universidad de Cantabria (UC)Inglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccessoai:repositorio.unican.es:10902/296852026-06-02T12:39:31Z
dc.title.none.fl_str_mv Similarity of samples and trimming
title Similarity of samples and trimming
spellingShingle Similarity of samples and trimming
Álvarez Esteban, Pedro César|||0000-0002-8818-0194
Asymptotics
Bootstrap
Consistency
Mass transportation problem
Over-fitting
Robustness
Similarity of distributions
Trimmed probability
Wasserstein distance
title_short Similarity of samples and trimming
title_full Similarity of samples and trimming
title_fullStr Similarity of samples and trimming
title_full_unstemmed Similarity of samples and trimming
title_sort Similarity of samples and trimming
dc.creator.none.fl_str_mv Álvarez Esteban, Pedro César|||0000-0002-8818-0194
Barrio Tellado, Eustasio del
Cuesta Albertos, Juan Antonio|||0000-0001-8228-5924
Matran Bea, Carlos
author Álvarez Esteban, Pedro César|||0000-0002-8818-0194
author_facet Álvarez Esteban, Pedro César|||0000-0002-8818-0194
Barrio Tellado, Eustasio del
Cuesta Albertos, Juan Antonio|||0000-0001-8228-5924
Matran Bea, Carlos
author_role author
author2 Barrio Tellado, Eustasio del
Cuesta Albertos, Juan Antonio|||0000-0001-8228-5924
Matran Bea, Carlos
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidad de Cantabria
dc.subject.none.fl_str_mv Asymptotics
Bootstrap
Consistency
Mass transportation problem
Over-fitting
Robustness
Similarity of distributions
Trimmed probability
Wasserstein distance
topic Asymptotics
Bootstrap
Consistency
Mass transportation problem
Over-fitting
Robustness
Similarity of distributions
Trimmed probability
Wasserstein distance
description We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.
publishDate 2012
dc.date.none.fl_str_mv 2012
2012-05-01
dc.type.none.fl_str_mv journal article
http://purl.org/coar/resource_type/c_6501
NA
http://purl.org/coar/version/c_be7fb7dd8ff6fe43
dc.type.openaire.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.none.fl_str_mv https://hdl.handle.net/10902/29685
url https://hdl.handle.net/10902/29685
dc.language.none.fl_str_mv Inglés
eng
language_invalid_str_mv Inglés
language eng
dc.rights.none.fl_str_mv open access
http://purl.org/coar/access_right/c_abf2
dc.rights.openaire.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv open access
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv International Statistical Institute; Chapman and Hall
publisher.none.fl_str_mv International Statistical Institute; Chapman and Hall
dc.source.none.fl_str_mv Bernoulli, 2012, 18(2), 606-634
reponame:UCrea Repositorio Abierto de la Universidad de Cantabria
instname:Universidad de Cantabria (UC)
instname_str Universidad de Cantabria (UC)
reponame_str UCrea Repositorio Abierto de la Universidad de Cantabria
collection UCrea Repositorio Abierto de la Universidad de Cantabria
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869411824097558528
score 15,300724