Sparse Linear System Solvers on GPUs: Parallel Preconditioning, Workload Balancing, and Communication Reduction

With the breakdown of Dennard scaling in the mid-2000s and the end of Moore's law on the horizon, the high performance computing community is turning its attention towards unconventional accelerator hardware to ensure the continued growth of computational capacity. This dissertation presents se...

Descripción completa

Detalles Bibliográficos
Autor: Flegar, Goran
Tipo de recurso: tesis doctoral
Estado:Versión publicada
Fecha de publicación:2019
País:España
Institución:CBUC, CESCA
Repositorio:TDR. Tesis Doctorales en Red
OAI Identifier:oai:www.tdx.cat:10803/667096
Acceso en línea:http://hdl.handle.net/10803/667096
http://dx.doi.org/10.6035/14101.2019.709084
Access Level:acceso abierto
Palabra clave:High Performance Computing
Graphics Processing Units
Adaptive Precision
Krylov Methods
Sparse Matrix-Vector Product
Preconditioning
Tecnologies de la informació i les comunicacions (TIC)
004
Descripción
Sumario:With the breakdown of Dennard scaling in the mid-2000s and the end of Moore's law on the horizon, the high performance computing community is turning its attention towards unconventional accelerator hardware to ensure the continued growth of computational capacity. This dissertation presents several contributions related to the iterative solution of sparse linear systems on the most widely used general purpose accelerator - the Graphics Processing Unit (GPU). Specifically, it accelerates the major building blocks of Krylov solvers, and describes their realization as part of a software library of reusable building blocks. The first part of the dissertation focuses on the sparse matrix-vector product and effective load balancing in the presence of irregular sparsity patterns. The second part describes the design of high-performance preconditioners. Finally, the third part demonstrates the potential of adaptive precision techniques for constructing preconditioners with lower memory footprint, and accuracy comparable to their full precision equivalents.