Mapreduce performance model for Hadoop 2.x

Glushkova, Daria|||0000-0002-8906-4793; Jovanovic, Petar|||0000-0003-4635-6646; Abelló Gamazo, Alberto|||0000-0002-3223-2186

Mapreduce performance model for Hadoop 2.x

MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may pro...

ver descrição completa

Detalhes bibliográficos
Autores:	Glushkova, Daria\|\|\|0000-0002-8906-4793, Jovanovic, Petar\|\|\|0000-0003-4635-6646, Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186
Tipo de documento:	artigo
Data de publicação:	2018
País:	España
Recursos:	Universitat Politècnica de Catalunya (UPC)
Repositório:	UPCommons. Portal del coneixement obert de la UPC
Idioma:	inglês
OAI Identifier:	oai:upcommons.upc.edu:2117/124328
Acesso em linha:	https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006
Access Level:	Acceso aberto
Palavra-chave:	Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes

id	ES_2b4a3d0f2801ecc66e65d2de10a731b2
oai_identifier_str	oai:upcommons.upc.edu:2117/124328
network_acronym_str	ES
network_name_str	España
repository_id_str
spelling	Mapreduce performance model for Hadoop 2.xGlushkova, Daria\|\|\|0000-0002-8906-4793Jovanovic, Petar\|\|\|0000-0003-4635-6646Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186Electronic data processing -- Distributed processingCost effectivenessHadoop 2.xMapReduce performance modelProcessament distribuït de dadesCost-eficàciaÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdesMapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup.Peer ReviewedElsevier20192019-01-0120182018-11-15journal articlehttp://purl.org/coar/resource_type/c_6501AMhttp://purl.org/coar/version/c_ab4af688f83e57aainfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/2117/124328https://dx.doi.org/10.1016/j.is.2017.11.006reponame:UPCommons. Portal del coneixement obert de la UPCinstname:Universitat Politècnica de Catalunya (UPC)Inglésengopen accesshttp://purl.org/coar/access_right/c_abf2Attribution-NonCommercial-NoDerivs 3.0 Spainhttp://creativecommons.org/licenses/by-nc-nd/3.0/es/info:eu-repo/semantics/openAccessoai:upcommons.upc.edu:2117/1243282026-05-27T15:37:01Z
dc.title.none.fl_str_mv	Mapreduce performance model for Hadoop 2.x
title	Mapreduce performance model for Hadoop 2.x
spellingShingle	Mapreduce performance model for Hadoop 2.x Glushkova, Daria\|\|\|0000-0002-8906-4793 Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes
title_short	Mapreduce performance model for Hadoop 2.x
title_full	Mapreduce performance model for Hadoop 2.x
title_fullStr	Mapreduce performance model for Hadoop 2.x
title_full_unstemmed	Mapreduce performance model for Hadoop 2.x
title_sort	Mapreduce performance model for Hadoop 2.x
dc.creator.none.fl_str_mv	Glushkova, Daria\|\|\|0000-0002-8906-4793 Jovanovic, Petar\|\|\|0000-0003-4635-6646 Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186
author	Glushkova, Daria\|\|\|0000-0002-8906-4793
author_facet	Glushkova, Daria\|\|\|0000-0002-8906-4793 Jovanovic, Petar\|\|\|0000-0003-4635-6646 Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186
author_role	author
author2	Jovanovic, Petar\|\|\|0000-0003-4635-6646 Abelló Gamazo, Alberto\|\|\|0000-0002-3223-2186
author2_role	author author
dc.subject.none.fl_str_mv	Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes
topic	Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes
description	MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup.
publishDate	2018
dc.date.none.fl_str_mv	2018 2018-11-15 2019 2019-01-01
dc.type.none.fl_str_mv	journal article http://purl.org/coar/resource_type/c_6501 AM http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.openaire.fl_str_mv	info:eu-repo/semantics/article
format	article
dc.identifier.none.fl_str_mv	https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006
url	https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006
dc.language.none.fl_str_mv	Inglés eng
language_invalid_str_mv	Inglés
language	eng
dc.rights.none.fl_str_mv	open access http://purl.org/coar/access_right/c_abf2 Attribution-NonCommercial-NoDerivs 3.0 Spain http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.openaire.fl_str_mv	info:eu-repo/semantics/openAccess
rights_invalid_str_mv	open access http://purl.org/coar/access_right/c_abf2 Attribution-NonCommercial-NoDerivs 3.0 Spain http://creativecommons.org/licenses/by-nc-nd/3.0/es/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:UPCommons. Portal del coneixement obert de la UPC instname:Universitat Politècnica de Catalunya (UPC)
instname_str	Universitat Politècnica de Catalunya (UPC)
reponame_str	UPCommons. Portal del coneixement obert de la UPC
collection	UPCommons. Portal del coneixement obert de la UPC
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_	1869405135163097088
score	15,300724

Mapreduce performance model for Hadoop 2.x

Registros relacionados