An integration-oriented ontology to govern evolution in big data ecosystems

Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API r...

Descripción completa

Detalles Bibliográficos
Autores: Nadal Francesch, Sergi|||0000-0002-8565-952X, Romero Moral, Óscar|||0000-0001-6350-8328, Abelló Gamazo, Alberto|||0000-0002-3223-2186, Vassiliadis, Panos, Vansummeren, Stijn
Tipo de recurso: artículo
Fecha de publicación:2018
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/117075
Acceso en línea:https://hdl.handle.net/2117/117075
https://dx.doi.org/10.1016/j.is.2018.01.006
Access Level:acceso abierto
Palabra clave:Semantic web
Ontologies (Information retrieval)
Big data
Data integration
Evolution
Web semàtica
Ontologies (Informàtica)
Macrodades
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
id ES_f6cdf72f009c7ad4cd1d3bb59fa68b6e
oai_identifier_str oai:upcommons.upc.edu:2117/117075
network_acronym_str ES
network_name_str España
repository_id_str
spelling An integration-oriented ontology to govern evolution in big data ecosystemsNadal Francesch, Sergi|||0000-0002-8565-952XRomero Moral, Óscar|||0000-0001-6350-8328Abelló Gamazo, Alberto|||0000-0002-3223-2186Vassiliadis, PanosVansummeren, StijnSemantic webOntologies (Information retrieval)Big dataData integrationEvolutionWeb semàticaOntologies (Informàtica)MacrodadesÀrees temàtiques de la UPC::Informàtica::Sistemes d'informacióBig Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.Peer Reviewed20192019-01-0120182018-05-10journal articlehttp://purl.org/coar/resource_type/c_6501AMhttp://purl.org/coar/version/c_ab4af688f83e57aainfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/2117/117075https://dx.doi.org/10.1016/j.is.2018.01.006reponame:UPCommons. Portal del coneixement obert de la UPCinstname:Universitat Politècnica de Catalunya (UPC)InglésengEuropean Commission http://doi.org/10.13039/100010661 Horizon 2020 Framework Programme 644018 SUpporting evolution and adaptation of PERsonalized Software by Exploiting contextual Data and End-user feedbackopen accesshttp://purl.org/coar/access_right/c_abf2Attribution-NonCommercial-NoDerivs 3.0 Spainhttp://creativecommons.org/licenses/by-nc-nd/3.0/es/info:eu-repo/semantics/openAccessoai:upcommons.upc.edu:2117/1170752026-05-27T15:37:01Z
dc.title.none.fl_str_mv An integration-oriented ontology to govern evolution in big data ecosystems
title An integration-oriented ontology to govern evolution in big data ecosystems
spellingShingle An integration-oriented ontology to govern evolution in big data ecosystems
Nadal Francesch, Sergi|||0000-0002-8565-952X
Semantic web
Ontologies (Information retrieval)
Big data
Data integration
Evolution
Web semàtica
Ontologies (Informàtica)
Macrodades
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
title_short An integration-oriented ontology to govern evolution in big data ecosystems
title_full An integration-oriented ontology to govern evolution in big data ecosystems
title_fullStr An integration-oriented ontology to govern evolution in big data ecosystems
title_full_unstemmed An integration-oriented ontology to govern evolution in big data ecosystems
title_sort An integration-oriented ontology to govern evolution in big data ecosystems
dc.creator.none.fl_str_mv Nadal Francesch, Sergi|||0000-0002-8565-952X
Romero Moral, Óscar|||0000-0001-6350-8328
Abelló Gamazo, Alberto|||0000-0002-3223-2186
Vassiliadis, Panos
Vansummeren, Stijn
author Nadal Francesch, Sergi|||0000-0002-8565-952X
author_facet Nadal Francesch, Sergi|||0000-0002-8565-952X
Romero Moral, Óscar|||0000-0001-6350-8328
Abelló Gamazo, Alberto|||0000-0002-3223-2186
Vassiliadis, Panos
Vansummeren, Stijn
author_role author
author2 Romero Moral, Óscar|||0000-0001-6350-8328
Abelló Gamazo, Alberto|||0000-0002-3223-2186
Vassiliadis, Panos
Vansummeren, Stijn
author2_role author
author
author
author
dc.subject.none.fl_str_mv Semantic web
Ontologies (Information retrieval)
Big data
Data integration
Evolution
Web semàtica
Ontologies (Informàtica)
Macrodades
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
topic Semantic web
Ontologies (Information retrieval)
Big data
Data integration
Evolution
Web semàtica
Ontologies (Informàtica)
Macrodades
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
description Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.
publishDate 2018
dc.date.none.fl_str_mv 2018
2018-05-10
2019
2019-01-01
dc.type.none.fl_str_mv journal article
http://purl.org/coar/resource_type/c_6501
AM
http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.openaire.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.none.fl_str_mv https://hdl.handle.net/2117/117075
https://dx.doi.org/10.1016/j.is.2018.01.006
url https://hdl.handle.net/2117/117075
https://dx.doi.org/10.1016/j.is.2018.01.006
dc.language.none.fl_str_mv Inglés
eng
language_invalid_str_mv Inglés
language eng
dc.relation.none.fl_str_mv European Commission http://doi.org/10.13039/100010661 Horizon 2020 Framework Programme 644018 SUpporting evolution and adaptation of PERsonalized Software by Exploiting contextual Data and End-user feedback
dc.rights.none.fl_str_mv open access
http://purl.org/coar/access_right/c_abf2
Attribution-NonCommercial-NoDerivs 3.0 Spain
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.openaire.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv open access
http://purl.org/coar/access_right/c_abf2
Attribution-NonCommercial-NoDerivs 3.0 Spain
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:UPCommons. Portal del coneixement obert de la UPC
instname:Universitat Politècnica de Catalunya (UPC)
instname_str Universitat Politècnica de Catalunya (UPC)
reponame_str UPCommons. Portal del coneixement obert de la UPC
collection UPCommons. Portal del coneixement obert de la UPC
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869424797007478784
score 15.300724