Incremental schema integration for data wrangling via knowledge graphs

Flores Herrera, Javier de Jesús|||0000-0002-2998-9962; Rabbani, Kashif; Nadal Francesch, Sergi|||0000-0002-8565-952X; Gómez Seoane, Cristina|||0000-0002-3872-0439; Romero Moral, Óscar|||0000-0001-6350-8328; Jamin, Emmanuel; Dasiopoulou, Stamatia

Incremental schema integration for data wrangling via knowledge graphs

Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data...

Descripción completa

Detalles Bibliográficos
Autores:	Flores Herrera, Javier de Jesús\|\|\|0000-0002-2998-9962, Rabbani, Kashif, Nadal Francesch, Sergi\|\|\|0000-0002-8565-952X, Gómez Seoane, Cristina\|\|\|0000-0002-3872-0439, Romero Moral, Óscar\|\|\|0000-0001-6350-8328, Jamin, Emmanuel, Dasiopoulou, Stamatia
Tipo de recurso:	artículo
Fecha de publicación:	2023
País:	España
Institución:	Universitat Politècnica de Catalunya (UPC)
Repositorio:	UPCommons. Portal del coneixement obert de la UPC
Idioma:	inglés
OAI Identifier:	oai:upcommons.upc.edu:2117/390416
Acceso en línea:	https://hdl.handle.net/2117/390416 https://dx.doi.org/10.3233/SW-233347
Access Level:	acceso abierto
Palabra clave:	Big data Decision-making Schema integration Bootstrapping Virtual data integration Dades massives Decisió, Presa de Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació

Descripción
Sumario:	Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.

Incremental schema integration for data wrangling via knowledge graphs

Similares en LA Referencia