Differentially private publication of database streams via hybrid video coding
While most anonymization technology available today is designed for static and small data, the current picture is of massive volumes of dynamic data arriving at unprecedented velocities. From the standpoint of anonymization, the most challenging type of dynamic data is data streams. However, while t...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2022 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/386243 |
| Acceso en línea: | https://hdl.handle.net/2117/386243 https://dx.doi.org/10.1016/j.knosys.2022.108778 |
| Access Level: | acceso abierto |
| Palabra clave: | Data management Big data Database anonymization Data streams Privacy Video encoding Dades massives Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors |
| Sumario: | While most anonymization technology available today is designed for static and small data, the current picture is of massive volumes of dynamic data arriving at unprecedented velocities. From the standpoint of anonymization, the most challenging type of dynamic data is data streams. However, while the majority of proposals deal with publishing either count-based or aggregated statistics about the underlying stream, little attention has been paid to the problem of continuously publishing the stream itself with differential privacy guarantees. In this work, we propose an anonymization method that can publish multiple numerical-attribute, finite microdata streams with high protection as well as high utility, the latter aspect measured as data distortion, delay and record reordering. Our method, which relies on the well-known differential pulse-code modulation scheme, adapts techniques originally intended for hybrid video encoding, to favor and leverage dependencies among the blocks of the original stream and thereby reduce data distortion. The proposed solution is assessed experimentally on two of the largest data sets in the scientific community working in data anonymization. Our extensive empirical evaluation shows the trade-off among privacy protection, data distortion, delay and record reordering, and demonstrates the suitability of adapting video-compression techniques to anonymize database streams. |
|---|