PyCOMPSs: Parallel computational workflows in Python

The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a soluti...

Descripción completa

Detalles Bibliográficos
Autores: Tejedor, Enric, Becerra Fontal, Yolanda|||0000-0003-2357-7796, Alomar, Guillem, Queralt Calafat, Anna|||0000-0003-2782-2955, Badia Sala, Rosa Maria|||0000-0003-2941-5499, Torres Viñals, Jordi|||0000-0003-1963-7418, Cortés, Toni|||0000-0002-2537-8937, Labarta Mancho, Jesús José|||0000-0002-7489-4727
Tipo de recurso: artículo
Fecha de publicación:2017
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/110724
Acceso en línea:https://hdl.handle.net/2117/110724
https://dx.doi.org/10.1177/1094342015594678
Access Level:acceso abierto
Palabra clave:Big data
Parallel programming (Computer science)
Scientific computing
Parallel programming models
Python
Big data storage
Macrodades
Programació en paral·lel (Informàtica)
Àrees temàtiques de la UPC::Informàtica::Llenguatges de programació
Descripción
Sumario:The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.