Mapreduce performance model for Hadoop 2.x
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may pro...
| Autores: | , , |
|---|---|
| Tipo de documento: | artigo |
| Data de publicação: | 2018 |
| País: | España |
| Recursos: | Universitat Politècnica de Catalunya (UPC) |
| Repositório: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglês |
| OAI Identifier: | oai:upcommons.upc.edu:2117/124328 |
| Acesso em linha: | https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006 |
| Access Level: | Acceso aberto |
| Palavra-chave: | Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes |
| id |
ES_2b4a3d0f2801ecc66e65d2de10a731b2 |
|---|---|
| oai_identifier_str |
oai:upcommons.upc.edu:2117/124328 |
| network_acronym_str |
ES |
| network_name_str |
España |
| repository_id_str |
|
| spelling |
Mapreduce performance model for Hadoop 2.xGlushkova, Daria|||0000-0002-8906-4793Jovanovic, Petar|||0000-0003-4635-6646Abelló Gamazo, Alberto|||0000-0002-3223-2186Electronic data processing -- Distributed processingCost effectivenessHadoop 2.xMapReduce performance modelProcessament distribuït de dadesCost-eficàciaÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdesMapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup.Peer ReviewedElsevier20192019-01-0120182018-11-15journal articlehttp://purl.org/coar/resource_type/c_6501AMhttp://purl.org/coar/version/c_ab4af688f83e57aainfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/2117/124328https://dx.doi.org/10.1016/j.is.2017.11.006reponame:UPCommons. Portal del coneixement obert de la UPCinstname:Universitat Politècnica de Catalunya (UPC)Inglésengopen accesshttp://purl.org/coar/access_right/c_abf2Attribution-NonCommercial-NoDerivs 3.0 Spainhttp://creativecommons.org/licenses/by-nc-nd/3.0/es/info:eu-repo/semantics/openAccessoai:upcommons.upc.edu:2117/1243282026-05-27T15:37:01Z |
| dc.title.none.fl_str_mv |
Mapreduce performance model for Hadoop 2.x |
| title |
Mapreduce performance model for Hadoop 2.x |
| spellingShingle |
Mapreduce performance model for Hadoop 2.x Glushkova, Daria|||0000-0002-8906-4793 Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes |
| title_short |
Mapreduce performance model for Hadoop 2.x |
| title_full |
Mapreduce performance model for Hadoop 2.x |
| title_fullStr |
Mapreduce performance model for Hadoop 2.x |
| title_full_unstemmed |
Mapreduce performance model for Hadoop 2.x |
| title_sort |
Mapreduce performance model for Hadoop 2.x |
| dc.creator.none.fl_str_mv |
Glushkova, Daria|||0000-0002-8906-4793 Jovanovic, Petar|||0000-0003-4635-6646 Abelló Gamazo, Alberto|||0000-0002-3223-2186 |
| author |
Glushkova, Daria|||0000-0002-8906-4793 |
| author_facet |
Glushkova, Daria|||0000-0002-8906-4793 Jovanovic, Petar|||0000-0003-4635-6646 Abelló Gamazo, Alberto|||0000-0002-3223-2186 |
| author_role |
author |
| author2 |
Jovanovic, Petar|||0000-0003-4635-6646 Abelló Gamazo, Alberto|||0000-0002-3223-2186 |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes |
| topic |
Electronic data processing -- Distributed processing Cost effectiveness Hadoop 2.x MapReduce performance model Processament distribuït de dades Cost-eficàcia Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes |
| description |
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup. |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018 2018-11-15 2019 2019-01-01 |
| dc.type.none.fl_str_mv |
journal article http://purl.org/coar/resource_type/c_6501 AM http://purl.org/coar/version/c_ab4af688f83e57aa |
| dc.type.openaire.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| dc.identifier.none.fl_str_mv |
https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006 |
| url |
https://hdl.handle.net/2117/124328 https://dx.doi.org/10.1016/j.is.2017.11.006 |
| dc.language.none.fl_str_mv |
Inglés eng |
| language_invalid_str_mv |
Inglés |
| language |
eng |
| dc.rights.none.fl_str_mv |
open access http://purl.org/coar/access_right/c_abf2 Attribution-NonCommercial-NoDerivs 3.0 Spain http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
| dc.rights.openaire.fl_str_mv |
info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
open access http://purl.org/coar/access_right/c_abf2 Attribution-NonCommercial-NoDerivs 3.0 Spain http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Elsevier |
| publisher.none.fl_str_mv |
Elsevier |
| dc.source.none.fl_str_mv |
reponame:UPCommons. Portal del coneixement obert de la UPC instname:Universitat Politècnica de Catalunya (UPC) |
| instname_str |
Universitat Politècnica de Catalunya (UPC) |
| reponame_str |
UPCommons. Portal del coneixement obert de la UPC |
| collection |
UPCommons. Portal del coneixement obert de la UPC |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1869405135163097088 |
| score |
15,300724 |