The MPI/OmpSs parallel programming model
Even today supercomputing systems have already reached millions of cores for a single machine, which are connected by using a complex network interconnection. Reducing communication time across processes becomes the most important issue in order to achieve the highest possible performance. The Messa...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis doctoral |
| Estado: | Versión publicada |
| Fecha de publicación: | 2016 |
| País: | España |
| Institución: | CBUC, CESCA |
| Repositorio: | TDR. Tesis Doctorales en Red |
| OAI Identifier: | oai:www.tdx.cat:10803/398135 |
| Acceso en línea: | http://hdl.handle.net/10803/398135 https://dx.doi.org/10.5821/dissertation-2117-98109 |
| Access Level: | acceso abierto |
| Palabra clave: | Àrees temàtiques de la UPC::Informàtica 004 |
| Sumario: | Even today supercomputing systems have already reached millions of cores for a single machine, which are connected by using a complex network interconnection. Reducing communication time across processes becomes the most important issue in order to achieve the highest possible performance. The Message Passing Interface (MPI), which is the most widely used programming model for large distributed memory, supports asynchronous communication primitives for overlapping communication and computation. However, these primitives are difficult to use and increase code complexity. which then requiring more development effort and making less readable programs. This thesis presents a new programming model, which allows the programmer to easily introduce the asynchrony necessary to overlap communication and computation. The proposed programming model is based on MPI and tasked based shared memory framework, namely OmpSs. The thesis further describes implementation details which in order to allow efficient inter-operation of the OmpSs runtime and MPI. The thesis demonstrates the hybrid use of MPI/OmpSs with several applications of which the HPL benchmark is the most important case study. The hybrid MPI/OmpSs versions significantly improve the performance of the applications compared with their pure MPI counterparts. For the HPL we get close to the asymptotic performance at relatively small problem sizes and still get significant benefits at large problem sizes. In addition, the hybrid MPI/OmpSs approach substantially reduces code complexity and is less sensitive to network bandwidth and operating system noise than the pure MPI versions. In addition, the thesis analyzes and compares current techniques for overlapping computation and collective communication, including approaches using point-to-point communications and additional communication threads, respectively. The thesis stresses the importance of understanding the characteristic of a computational kernel that runs concurrently with communication. Experimental evaluations is done using the Communication Computation Concurrent (CCUBE) synthetic benchmark, developed in this thesis, as well as the HPL. |
|---|