Implementation of the DWT in a GPU through a register-based strategy

The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger register memory space and instructions for the communication of registers among threads. This facilitates a new programming strategy that utilizes registers for data sharing and reusing in detriment of t...

Descripción completa

Detalles Bibliográficos
Autores: Enfedaque Montes, Pablo, Aulí Llinàs, Francesc|||0000-0002-3208-9957, Moure, Juan C.|||0000-0001-6697-0331
Tipo de recurso: artículo
Fecha de publicación:2015
País:España
Institución:Universitat Autònoma de Barcelona
Repositorio:Dipòsit Digital de Documents de la UAB
Idioma:inglés
OAI Identifier:oai:ddd.uab.cat:144728
Acceso en línea:https://ddd.uab.cat/record/144728
https://dx.doi.org/urn:doi:10.1109/TPDS.2014.2384047
Access Level:acceso abierto
Palabra clave:Compute Unified Device Architecture (CUDA)
DWT
Discrete Wavelet Transform (DWT)
Graphics Processing Unit (GPU)
Descripción
Sumario:The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger register memory space and instructions for the communication of registers among threads. This facilitates a new programming strategy that utilizes registers for data sharing and reusing in detriment of the shared memory. Such a programming strategy can significantly improve the performance of applications that reuse data heavily. This paper presents a register-based implementation of the Discrete Wavelet Transform (DWT), the prevailing data decorrelation technique in the field of image coding. Experimental results indicate that the proposed method is, at least, four times faster than the best GPU implementation of the DWT found in the literature. Furthermore, theoretical analysis coincide with experimental tests in proving that the execution times achieved by the proposed implementation are close to the GPU's performance limits.