Efficient simulation execution of cellular automata on GPU

Graphics Processing Units (GPUs) can be used as convenient hardware accelerators to speed up Cellular Automata (CA) simulations, which are employed in many scientific areas. However, an important set of CA have performance constraints due to GPU memory bandwidth. Few studies have fully explored how...

Descripción completa

Detalles Bibliográficos
Autores: Cagigas Muñiz, Daniel, Díaz del Río, Fernando, Sevillano Ramos, José Luis, Guisado Lizar, José Luis
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2022
País:España
Institución:Universidad de Sevilla (US)
Repositorio:idUS. Depósito de Investigación de la Universidad de Sevilla
OAI Identifier:oai:idus.us.es:11441/131065
Acceso en línea:https://hdl.handle.net/11441/131065
https://doi.org/10.1016/j.simpat.2022.102519
Access Level:acceso abierto
Palabra clave:Cellular automata
Parallel Computing
Performance optimization
Stencil computation
Graphics Processing Units
Descripción
Sumario:Graphics Processing Units (GPUs) can be used as convenient hardware accelerators to speed up Cellular Automata (CA) simulations, which are employed in many scientific areas. However, an important set of CA have performance constraints due to GPU memory bandwidth. Few studies have fully explored how CA implementations can take advantage of modern GPU architectures, mainly in the case of intensive memory usage. In this paper, we make a thorough study of techniques (stencil computing framework, look-up tables, and packet coding) to efficiently implement CA on GPU, taking into account its detailed architecture. Exhaustive experiments to validate these implementation techniques for a number of significant memory bounded CA are performed. The CA analysed include the classical Game of Life, a Forest Fire model, a Cyclic cellular automaton, and the WireWorld CA. The experimental results show that implementations using the presented techniques can significantly outperform a baseline standard GPU implementation. The best performance results of all known implementations of memory bounded CA were obtained. Moreover, some of the techniques, like look-up tables or temporal blocking, are indeed relatively easy to implement or to apply when the transition rules are simple. Finally, detailed descriptions and discussions of the indicated techniques are included, which may be useful to practitioners interested in developing high performance simulations in efficient languages based on CA on GPU.