Upgrading a high performance computing environment for massive data processing

High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this...

Descripción completa

Detalles Bibliográficos
Autores: Ponce, Lucas M., dos Santos, Walter, Meira Jr, Wagner, Guedes, Dorgival, Lezzi, Daniele|||0000-0001-5081-7244, Badia Sala, Rosa Maria|||0000-0003-2941-5499
Tipo de recurso: artículo
Fecha de publicación:2019
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/186788
Acceso en línea:https://hdl.handle.net/2117/186788
https://dx.doi.org/10.1186/s13174-019-0118-7
Access Level:acceso abierto
Palabra clave:Big data
High performance computing
COMPSs
HDFS
Lemonade
Macrodades
Càlcul intensiu (Informàtica)
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
Descripción
Sumario:High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.