Implementation of Autoencoders with Systolic Arrays through OpenCL

[EN] In the world of algorithm acceleration and the implementation of deep neural networks' recall phase, OpenCL based solutions have a clear tendency to produce perfectly adapted kernels in graphic processor unit (GPU) architectures. However, they fail to obtain the same results when appli...

Descripción completa

Detalles Bibliográficos
Autores: Gadea Gironés, Rafael|||0000-0003-2857-8667, Herrero Bosch, Vicente|||0000-0003-0860-2789, Monzó Ferrer, José María|||0000-0001-6554-3231, Colom Palero, Ricardo José|||0000-0003-0704-4906
Tipo de recurso: artículo
Fecha de publicación:2021
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/180091
Acceso en línea:https://riunet.upv.es/handle/10251/180091
Access Level:acceso abierto
Palabra clave:OpenCL
Neural networks
Systolic arrays
FPGA
TECNOLOGIA ELECTRONICA
Descripción
Sumario:[EN] In the world of algorithm acceleration and the implementation of deep neural networks' recall phase, OpenCL based solutions have a clear tendency to produce perfectly adapted kernels in graphic processor unit (GPU) architectures. However, they fail to obtain the same results when applied to field-programmable gate array (FPGA) based architectures. This situation, along with an enormous advance in new GPU architectures, makes it unfeasible to defend an acceleration solution based on FPGA, even in terms of energy efficiency. Our goal in this paper is to demonstrate that multikernel structures can be written based on classic systolic arrays in OpenCL, trying to extract the most advanced features of FPGAs without having to resort to traditional FPGA development using lower level hardware description languages (HDLs) such as Verilog or VHDL. This OpenCL methodology is based on the intensive use of channels (IntelFPGA extension of OpenCL) for the communication of both data and control and on the refinement of the OpenCL libraries using register transfer logic (RTL) code to improve the performance of the implementation of the base and activation functions of the neurons and, above all, to reflect the importance of adequate communication between the layers when implementing neuronal networks