Operationalizing and automating data validation in data spaces

Data spaces have recently emerged as an innovative paradigm for cross-organizational data sharing. These decentralized environments require sophisticated data governance protocols to ensure compliance with data standards, roles and policies. While current policy-based solutions address enforcement o...

Descripción completa

Detalles Bibliográficos
Autores: Hmimou Ham Man, Achraf, Jovanovic, Petar|||0000-0003-4635-6646, Nadal Francesch, Sergi|||0000-0002-8565-952X, Romero Moral, Óscar|||0000-0001-6350-8328, Queralt Calafat, Anna|||0000-0003-2782-2955
Tipo de recurso: artículo
Fecha de publicación:2025
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/449017
Acceso en línea:https://hdl.handle.net/2117/449017
https://dx.doi.org/10.1007/s41019-025-00317-7
Access Level:acceso abierto
Palabra clave:Data spaces
Data governance
Data validation
Knowledge graphs
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
Descripción
Sumario:Data spaces have recently emerged as an innovative paradigm for cross-organizational data sharing. These decentralized environments require sophisticated data governance protocols to ensure compliance with data standards, roles and policies. While current policy-based solutions address enforcement of data access control and usage rights, they lack mechanisms for automated data validation -essential for ensuring data quality for collaborative analytics. To address this gap, we present a knowledge graph-based framework to automate data validation inline with data policies. This framework relies on the concept of policy checkers, which represent high-level and technology-agnostic data validation plans that can be dynamically translated into technology-specific user defined functions (UDFs) for compliance checking. Importantly, the usage of knowledge graphs to describe the policy checkers enhances the transparency and traceability of data validation processes, while the two-stage process (technology-agnostic policy checkers and technology-specific UDFs) accommodate data validation on multimodal data. We accompany the description of our approach with a proof of concept that demonstrates the feasibility of this solution in real data spaces.