p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation

We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee....

Descripción completa

Detalles Bibliográficos
Autores: Rebollo Monedero, David|||0000-0002-0783-2382, Forné Muñoz, Jorge|||0000-0002-8401-3292, Soriano Ibáñez, Miguel|||0000-0003-0457-8531, Puiggalí Allepuz, Jordi
Tipo de recurso: artículo
Fecha de publicación:2016
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/103718
Acceso en línea:https://hdl.handle.net/2117/103718
https://dx.doi.org/10.1016/j.ins.2016.12.002
Access Level:acceso abierto
Palabra clave:Mathematical statistics
k-Anonymity
Microaggregation
Probabilistic anonymity
Surveys
Estadística matemàtica
Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica
Descripción
Sumario:We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository. We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.