Enabling HW-based task scheduling in large multicore architectures

Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at...

Descripción completa

Detalles Bibliográficos
Autores: Morais, Lucas Henrique, Álvarez Martínez, Carlos|||0000-0003-0536-5183, Jiménez González, Daniel|||0000-0001-6064-7883, De Haro Ruiz, Juan Miguel|||0000-0002-7427-9118, Araujo, Guido, Frank, Michael, Goldman, Alfredo, Martorell Bofill, Xavier|||0000-0002-0417-3430
Tipo de recurso: artículo
Fecha de publicación:2023
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/395689
Acceso en línea:https://hdl.handle.net/2117/395689
https://dx.doi.org/10.1109/TC.2023.3323781
Access Level:acceso abierto
Palabra clave:Parallel processing (Electronic computers)
Parallel programming (Computer science)
Hardware acceleration
Task scheduling
RISC-V
Custom ISA
FPGA
Processament en paral·lel (Ordinadors)
Programació en paral·lel (Informàtica)
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
Descripción
Sumario:Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at runtime and schedule work to available cores. Traditional SW-only systems implicate scheduling overheads of around 30K processor cycles per task, which severely limit the ( core count , task granularity ) combinations that they might adequately handle. Previous work on HW-accelerated Task Scheduling has shown that such systems might support high performance scheduling on processors with up to eight cores, but questions remained regarding the viability of such solutions to support the greater number of cores now frequently found in high-end SMP systems. The present work presents an FPGA-proven, tightly-integrated, Linux-capable, 30-core RISC-V system with hardware accelerated Task Scheduling. We use this implementation to show that HW Task Scheduling can still offer competitive performance at such high core count, and describe how this organization includes hardware and software optimizations that make it even more scalable than previous solutions. Finally, we outline ways in which this architecture could be augmented to overcome inter-core communication bottlenecks, mitigating the cache-degradation effects usually involved in the parallelization of highly optimized serial code.