Static scheduling of the LU factorization with look-ahead on asymmetric multicore processors

We analyze the benefits of look-ahead in the parallel execution of the LU factorization with partial pivoting (LUpp) in two distinct “asymmetric” multicore scenarios. The first one corresponds to an actual hardware-asymmetric architecture such as the Samsung Exynos 5422 system-on-chip (SoC), equippe...

ver descrição completa

Detalhes bibliográficos
Autores: Catalán Pallarés, Sandra, Herrero Zaragoza, José Ramón|||0000-0002-4060-367X, Quintana Ortí, Enrique Salvador, Rodríguez Sánchez, Rafael
Formato: artículo
Fecha de publicación:2018
País:España
Recursos:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/133854
Acesso em linha:https://hdl.handle.net/2117/133854
https://dx.doi.org/10.1016/j.parco.2018.04.006
Access Level:acceso abierto
Palavra-chave:Multiprocessors -- Programming
Dense linear algebra
LU factorization
Look-ahead
Asymmetric multicore processors
Multi-threading
Frequency scaling
Multiprocessadors -- Programació
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
Descrição
Resumo:We analyze the benefits of look-ahead in the parallel execution of the LU factorization with partial pivoting (LUpp) in two distinct “asymmetric” multicore scenarios. The first one corresponds to an actual hardware-asymmetric architecture such as the Samsung Exynos 5422 system-on-chip (SoC), equipped with an ARM big.LITTLE processor consisting of a quad-core Cortex-A15 cluster plus a quad-core Cortex-A7 cluster. For this scenario, we propose a careful mapping of the different types of tasks appearing in LUpp to the computational resources, in order to produce an efficient architecture-aware exploitation of the computational resources integrated in this SoC. The second asymmetric configuration appears in a hardware-symmetric multicore architecture where the cores can individually operate at a different frequency levels. In this scenario, we show how to employ the frequency slack to accelerate the tasks in the critical path of LUpp in order to produce a faster global execution as well as a lower energy consumption.