Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence

During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most c...

Descripción completa

Detalles Bibliográficos
Autores: Lucas Casamayor, Enrique de, Marcuello Pascual, Pedro, Parcerisa Bundó, Joan Manuel|||0000-0001-5771-8118, González Colás, Antonio María|||0000-0002-0009-0996
Tipo de recurso: artículo
Fecha de publicación:2018
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/122559
Acceso en línea:https://hdl.handle.net/2117/122559
https://dx.doi.org/10.1109/TPDS.2018.2866246
Access Level:acceso abierto
Palabra clave:Image processing -- Digital techniques
Color computer graphics
Rendering (Computer graphics)
Graphics processing units
GPU
Graphics Pipeline
Energy-efficiency
Rasterization
Rendering
Fragmen processing
Pixel shading
Occlusion culling
Visibility
Tile based deferred rendering
Tile based rendering
Topological order
Imatges -- Processament -- Tècniques digitals
Infografia en color
Àrees temàtiques de la UPC::Informàtica::Infografia
Descripción
Sumario:During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.