User-directed Vectorization in OmpSs

In the recent shift to the multi-core and many-core era, where systems tend to be heterogeneous even at chip level, SIMD instruction sets and accelerators that exploit parallelism in a similar way are coming into prominence in new multiprocessors and systems. This heterogeneity, even at chip level,...

Descripción completa

Detalles Bibliográficos
Autor: Caballero de Gea, Diego Luis
Tipo de recurso: tesis de maestría
Fecha de publicación:2011
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2099.1/25812
Acceso en línea:https://hdl.handle.net/2099.1/25812
Access Level:acceso abierto
Palabra clave:Compilers (Computer programs)
Compiladors (Programes d'ordinador)
Descripción
Sumario:In the recent shift to the multi-core and many-core era, where systems tend to be heterogeneous even at chip level, SIMD instruction sets and accelerators that exploit parallelism in a similar way are coming into prominence in new multiprocessors and systems. This heterogeneity, even at chip level, is causing a lot of trouble to compilers and parallel programming models in terms of being able to maximize the profitability of the computational resources in an easy, generic, efficient and portable fashion. Although a lot of work on automatic vectorization/simdization techniques has been done over the years, compilers show important limitations when vectorizing code with pointers and function calls because of the traditional compiler analysis limitations, such as those in pointers aliasing analysis. Concerning parallel programming models, some of them are restricted to specific architectures while other portable ones, such as OpenCL, require programmers to face low-level architecture details and hard source code transformations, presenting important performance problems among different architectures, which requires new tuning efforts. In an attempt to offer a unified and generic solution to the auto-vectorization/simdization and portability problems, we propose User-directed Vectorization in OmpSs, a high-level programming model extension that offers developers the possibility to easily guide the compiler in the vectorization process just introducing some simple notations on the vectorizable areas of the code, such loops and functions. We focused our particular design, implementation and evaluation on the Intel SSE instruction set for CPUs, getting the same or higher speed-ups than using the GCC compiler auto-vectorization in easily-vectorizable codes, and a performance improvement of up to 2.30 in more complex codes where GCC is not able to apply auto-vectorization and the hand-coded OpenCL version reaches a speed-up of 2.23.