Differentially private publication of database streams via hybrid video coding

While most anonymization technology available today is designed for static and small data, the current picture is of massive volumes of dynamic data arriving at unprecedented velocities. From the standpoint of anonymization, the most challenging type of dynamic data is data streams. However, while t...

Descripción completa

Detalles Bibliográficos
Autores: Parra Arnau, Javier|||0000-0002-1772-1088, Strufe, Thorsten, Domingo Ferrer, Josep
Tipo de recurso: artículo
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/386243
Acceso en línea:https://hdl.handle.net/2117/386243
https://dx.doi.org/10.1016/j.knosys.2022.108778
Access Level:acceso abierto
Palabra clave:Data management
Big data
Database anonymization
Data streams
Privacy
Video encoding
Dades massives
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors
Descripción
Sumario:While most anonymization technology available today is designed for static and small data, the current picture is of massive volumes of dynamic data arriving at unprecedented velocities. From the standpoint of anonymization, the most challenging type of dynamic data is data streams. However, while the majority of proposals deal with publishing either count-based or aggregated statistics about the underlying stream, little attention has been paid to the problem of continuously publishing the stream itself with differential privacy guarantees. In this work, we propose an anonymization method that can publish multiple numerical-attribute, finite microdata streams with high protection as well as high utility, the latter aspect measured as data distortion, delay and record reordering. Our method, which relies on the well-known differential pulse-code modulation scheme, adapts techniques originally intended for hybrid video encoding, to favor and leverage dependencies among the blocks of the original stream and thereby reduce data distortion. The proposed solution is assessed experimentally on two of the largest data sets in the scientific community working in data anonymization. Our extensive empirical evaluation shows the trade-off among privacy protection, data distortion, delay and record reordering, and demonstrates the suitability of adapting video-compression techniques to anonymize database streams.