Static scheduling of the LU factorization with look-ahead on asymmetric multicore processors
We analyze the benefits of look-ahead in the parallel execution of the LU factorization with partial pivoting (LUpp) in two distinct “asymmetric” multicore scenarios. The first one corresponds to an actual hardware-asymmetric architecture such as the Samsung Exynos 5422 system-on-chip (SoC), equippe...
| Autores: | , , , |
|---|---|
| Formato: | artículo |
| Fecha de publicación: | 2018 |
| País: | España |
| Recursos: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/133854 |
| Acesso em linha: | https://hdl.handle.net/2117/133854 https://dx.doi.org/10.1016/j.parco.2018.04.006 |
| Access Level: | acceso abierto |
| Palavra-chave: | Multiprocessors -- Programming Dense linear algebra LU factorization Look-ahead Asymmetric multicore processors Multi-threading Frequency scaling Multiprocessadors -- Programació Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
| Resumo: | We analyze the benefits of look-ahead in the parallel execution of the LU factorization with partial pivoting (LUpp) in two distinct “asymmetric” multicore scenarios. The first one corresponds to an actual hardware-asymmetric architecture such as the Samsung Exynos 5422 system-on-chip (SoC), equipped with an ARM big.LITTLE processor consisting of a quad-core Cortex-A15 cluster plus a quad-core Cortex-A7 cluster. For this scenario, we propose a careful mapping of the different types of tasks appearing in LUpp to the computational resources, in order to produce an efficient architecture-aware exploitation of the computational resources integrated in this SoC. The second asymmetric configuration appears in a hardware-symmetric multicore architecture where the cores can individually operate at a different frequency levels. In this scenario, we show how to employ the frequency slack to accelerate the tasks in the critical path of LUpp in order to produce a faster global execution as well as a lower energy consumption. |
|---|