Data and stream parallelism optimizations on GPUs

Araujo, Gabriell Alves de

Data and stream parallelism optimizations on GPUs

Nowadays, most computers are equipped with Graphics Processing Units (GPUs) to provide massive-scale parallelism at a low cost. Parallel programming is necessary to exploit this architectural capacity fully. However, it represents a challenge for programmers since it requires refactoring algorithms,...

Descripción completa

Detalles Bibliográficos
Autor:	Araujo, Gabriell Alves de
Tipo de recurso:	tesis de maestría
Estado:	Versión publicada
Fecha de publicación:	2022
País:	Brasil
Institución:	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
Repositorio:	Biblioteca Digital de Teses e Dissertações da PUC_RS
Idioma:	inglés
OAI Identifier:	oai:tede2.pucrs.br:tede/10347
Acceso en línea:	https://tede2.pucrs.br/tede2/handle/tede/10347
Access Level:	acceso abierto
Palabra clave:	Parallel Programming GPU Programming Heterogeneous Computing Data Parallelism Stream Parallelism Structured Parallel Programming Parallel Patterns Benchmarks Stream Processing Applications Domain-specific Language Algorithmic Skeletons Performance Evaluation High Performance Computing C C++ CUDA OpenCL Programação Paralela Programação de GPUs Computação Heterogênea Paralelismo de Dados Paralelismo de Stream Programação Paralela Estruturada Padrões Paralelos Aplicações de Processamento de Stream Linguagem Específica de Domínio Esqueletos Algorítmicos Avaliação de Desempenho Computação de Alto Desempenho CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO

Descripción
Sumario:	Nowadays, most computers are equipped with Graphics Processing Units (GPUs) to provide massive-scale parallelism at a low cost. Parallel programming is necessary to exploit this architectural capacity fully. However, it represents a challenge for programmers since it requires refactoring algorithms, designing parallelism techniques, and hardwarespecific knowledge. Moreover, GPU parallelism is even more challenging since GPUs have peculiar hardware characteristics and employ a parallelism paradigm called many-core programming. In this sense, parallel computing research has focused on studying efficient programming techniques for GPUs and developing abstractions that reduce the effort when writing parallel code. SPar is a domain-specific language (DSL) that goes in this direction. Programmers can use SPar to express stream parallelism in a simpler way without significantly impacting performance. SPar offers high-level abstractions via code annotations while the SPar compiler generates parallel code. SPar recently received an extension to allow parallel code generation for CPUs and GPUs in stream applications. The CPU cores control the flow of data in the generated code. At the same time, the GPU applies massive parallelism in the computation of each stream element. To this end, SPar generates code for an intermediate library called GSParLib, a pattern-oriented parallel API that provides a unified programming model targeting CUDA and OpenCL runtime, allowing parallelism exploitation of different vendor GPUs. However, the GPU support for both SPar and GSParLib is still in its initial steps; they provide only basic features, and no studies have comprehensively evaluated SPar and GSParLib’s performance. This work contributes by parallelizing representative high-performance computing (HPC) benchmarks, implementing new features and optimizations for GPUs in the GSParLib and SPar, and presenting a method for providing agnostic frameworks independent of low-level programming interfaces. Our set of improvements covers most of the critical limitations of GSParLib regarding performance and programmability. In our experiments, the optimized version of GSParLib achieved up to 54,500.00% of speedup improvement over the original version of GSParLib on data parallelism benchmarks and up to 718,43% of throughput improvement on stream parallelism benchmarks.

Data and stream parallelism optimizations on GPUs

Similares en LA Referencia