Optimization of Deep Neural Networks Using SoCs with OpenCL

[EN] In the optimization of deep neural networks (DNNs) via evolutionary algorithms (EAs) and the implementation of the training necessary for the creation of the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions implemented on general-purpose...

Descripción completa

Detalles Bibliográficos
Autores: Gadea Gironés, Rafael|||0000-0003-2857-8667, Colom Palero, Ricardo José|||0000-0003-0704-4906, Herrero Bosch, Vicente|||0000-0003-0860-2789
Tipo de recurso: artículo
Fecha de publicación:2018
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/121267
Acceso en línea:https://riunet.upv.es/handle/10251/121267
Access Level:acceso abierto
Palabra clave:Evolutionary computation
Embedded system
FPGA
Deep neural networks
OpenCL,SoC
TECNOLOGIA ELECTRONICA
Descripción
Sumario:[EN] In the optimization of deep neural networks (DNNs) via evolutionary algorithms (EAs) and the implementation of the training necessary for the creation of the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions implemented on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism of these devices, whereas hardware realizations based on heterogeneous platforms (combining central processing units (CPUs), graphics processing units (GPUs) and/or field-programmable gate arrays (FPGAs)) are designed based on different solutions using methodologies supported by different languages and using very different implementation criteria. This paper first presents a study that demonstrates the need for a heterogeneous (CPU-GPU-FPGA) platform to accelerate the optimization of artificial neural networks (ANNs) using genetic algorithms. Second, the paper presents implementations of the calculations related to the individuals evaluated in such an algorithm on different (CPU- and FPGA-based) platforms, but with the same source files written in OpenCL. The implementation of individuals on remote, low-cost FPGA systems on a chip (SoCs) is found to enable the achievement of good efficiency in terms of performance per watt.