Beyond multivariate microaggregation for large record anonymization

Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least k elements and, therefore, preserving k -anonymity. Usually, in order to avoid information loss, w...

Descripción completa

Detalles Bibliográficos
Autor: Nin Guerrero, Jordi|||0000-0002-9659-2762
Tipo de recurso: artículo
Fecha de publicación:2014
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/23297
Acceso en línea:https://hdl.handle.net/2117/23297
https://dx.doi.org/10.1007/978-3-319-04178-0_8
Access Level:acceso abierto
Palabra clave:Data protection
Database security
Microaggregation
k-anonymity
Privacy in statistical databases
Protecció de dades
Bases de dades -- Seguretat
Àrees temàtiques de la UPC::Informàtica::Seguretat informàtica
Descripción
Sumario:Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least k elements and, therefore, preserving k -anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. This is called multivariate microaggregation. By using this technique, the information loss after collapsing several values to the centroid of their group is reduced. Unfortunately, with multivariate microaggregation, the k -anonymity property is lost when at least two attributes of different blocks are known by the intruder, which might be the usual case. In this work, we present a new microaggregation method called one dimension microaggregation ( Mic1D-k ). With Mic1D-k , the problem of k -anonymity loss is mitigated by mixing all the values in the original microdata file into a single non-attributed data set using a set of simple pre-processing steps and then, microaggregating all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.