Operationalizing and automating data governance

The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens o...

Descripción completa

Detalles Bibliográficos
Autores: Nadal Francesch, Sergi|||0000-0002-8565-952X, Jovanovic, Petar|||0000-0003-4635-6646, Bilalli, Besim|||0000-0002-0575-2389, Romero Moral, Óscar|||0000-0001-6350-8328
Tipo de recurso: artículo
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/378233
Acceso en línea:https://hdl.handle.net/2117/378233
https://dx.doi.org/10.1186/s40537-022-00673-5
Access Level:acceso abierto
Palabra clave:Big data
Metadata
Data governance
Data integration
Dades massives
Metadades
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
Descripción
Sumario:The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.