PerfBound: Conserving Energy with Bounded Overheads in On/Off-Based HPC Interconnects

Energy and power are key challenges in high-performance computing. System energy efficiency must be significantly improved, and this requires greater efficiency in all subcomponents. An important target of optimization is the interconnect, since network links are always on, consuming power even duri...

ver descrição completa

Detalhes bibliográficos
Autores: Saravanan, Karthikeyan P., Carpenter, Paul Matthew
Formato: artículo
Fecha de publicación:2018
País:España
Recursos:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/125138
Acesso em linha:https://hdl.handle.net/2117/125138
https://dx.doi.org/10.1109/TC.2018.2790394
Access Level:acceso abierto
Palavra-chave:High performance computing
Energy Efficient Interconnects
Energy Efficient Ethernet
Fast-Wake
Deep-Sleep
Supercomputadors
Àrees temàtiques de la UPC::Informàtica
Descrição
Resumo:Energy and power are key challenges in high-performance computing. System energy efficiency must be significantly improved, and this requires greater efficiency in all subcomponents. An important target of optimization is the interconnect, since network links are always on, consuming power even during idle periods. A large number of HPC machines have a primary interconnect based on Ethernet (about 40 percent of TOP500 machines), which, since 2010, has included support for saving power via Energy Efficient Ethernet (EEE). Nevertheless, it is unlikely that HPC interconnects would use these energy saving modes unless the performance overhead is known and small. This paper presents PerfBound, a self-contained technique to manage on/off-based networks such as EEE, minimizing interconnect link energy consumption subject to a bound on the performance degradation. PerfBound does not require changes to the applications and it uses only local information already available at switches and NICs without introducing additional communication messages, and is also compatible with multi-hop networks. PerfBound is evaluated using traces from a production supercomputer. For twelve out of fourteen applications, PerfBound has high energy savings, up to 70 percent for only 1 percent performance degradation. This paper also presents DynamicFastwake, which extends PerfBound to exploit multiple low-power states. DynamicFastwake achieves an energy-delay product 10 percent lower than the original PerfBound technique