Operationalizing and automating data validation in data spaces
Data spaces have recently emerged as an innovative paradigm for cross-organizational data sharing. These decentralized environments require sophisticated data governance protocols to ensure compliance with data standards, roles and policies. While current policy-based solutions address enforcement o...
| Autores: | , , , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2025 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/449017 |
| Acceso en línea: | https://hdl.handle.net/2117/449017 https://dx.doi.org/10.1007/s41019-025-00317-7 |
| Access Level: | acceso abierto |
| Palabra clave: | Data spaces Data governance Data validation Knowledge graphs Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació |
| Sumario: | Data spaces have recently emerged as an innovative paradigm for cross-organizational data sharing. These decentralized environments require sophisticated data governance protocols to ensure compliance with data standards, roles and policies. While current policy-based solutions address enforcement of data access control and usage rights, they lack mechanisms for automated data validation -essential for ensuring data quality for collaborative analytics. To address this gap, we present a knowledge graph-based framework to automate data validation inline with data policies. This framework relies on the concept of policy checkers, which represent high-level and technology-agnostic data validation plans that can be dynamically translated into technology-specific user defined functions (UDFs) for compliance checking. Importantly, the usage of knowledge graphs to describe the policy checkers enhances the transparency and traceability of data validation processes, while the two-stage process (technology-agnostic policy checkers and technology-specific UDFs) accommodate data validation on multimodal data. We accompany the description of our approach with a proof of concept that demonstrates the feasibility of this solution in real data spaces. |
|---|