Conciliating privacy and utility in data releases via individual differential privacy and microaggregation

ε-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees. However, when applied to data releases, DP significantly deteriorates the analytical utility of the protected outcomes. To keep data utility at reasonable levels, practical applications of DP to data rel...

Descripción completa

Detalles Bibliográficos
Autores: Soria-Comas, Jordi, Sanchez, David, Domingo-Ferrer, Josep, Martinez, Sergio, Del Vasto-Terrientes, Luis
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2025
País:España
Institución:Universitat Oberta de Catalunya (UOC)
Repositorio:O2, repositorio institucional de la UOC
OAI Identifier:oai:openaccess.uoc.edu:10609/152391
Acceso en línea:http://hdl.handle.net/10609/152391
Access Level:acceso abierto
Palabra clave:individual differential privacy
machine learning
data microaggregation
data releases
Descripción
Sumario:ε-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees. However, when applied to data releases, DP significantly deteriorates the analytical utility of the protected outcomes. To keep data utility at reasonable levels, practical applications of DP to data releases have used weak privacy parameters (large ε), which dilute the privacy guarantees of DP. In this work, we tackle this issue by using an alternative formulation of the DP privacy guarantees, named ε-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects. We enforce iDP in data releases by relying on attribute masking plus a pre-processing step based on data microaggregation. The goal of this step is to reduce the sensitivity to record changes, which determines the amount of noise required to enforce iDP (and DP). Specifically, we propose data microaggregation strategies designed for iDP whose sensitivities are significantly lower than those used in DP. As a result, we obtain iDP-protected data with significantly better utility than with DP. We report on experiments that show how our approach can provide strong privacy (small ε) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis.