An integration-oriented ontology to govern evolution in big data ecosystems

Nadal Francesch, Sergi|||0000-0002-8565-952X; Romero Moral, Óscar|||0000-0001-6350-8328; Abelló Gamazo, Alberto|||0000-0002-3223-2186; Vassiliadis, Panos; Vansummeren, Stijn

An integration-oriented ontology to govern evolution in big data ecosystems

Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API r...

Descripción completa

Detalles Bibliográficos
Autores:	Nadal Francesch, Sergi\|\|\|0000-0002-8565-952X, Romero Moral, Óscar\|\|\|0000-0001-6350-8328, Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186, Vassiliadis, Panos, Vansummeren, Stijn
Tipo de recurso:	artículo
Fecha de publicación:	2018
País:	España
Institución:	Universitat Politècnica de Catalunya (UPC)
Repositorio:	UPCommons. Portal del coneixement obert de la UPC
Idioma:	inglés
OAI Identifier:	oai:upcommons.upc.edu:2117/117075
Acceso en línea:	https://hdl.handle.net/2117/117075 https://dx.doi.org/10.1016/j.is.2018.01.006
Access Level:	acceso abierto
Palabra clave:	Semantic web Ontologies (Information retrieval) Big data Data integration Evolution Web semàtica Ontologies (Informàtica) Macrodades Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació

Descripción
Sumario:	Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.

An integration-oriented ontology to govern evolution in big data ecosystems

Similares en LA Referencia