An energy-efficient memory unit for clustered microarchitectures

Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clus...

Descripción completa

Detalles Bibliográficos
Autores: Bieschewski, Stefan, Parcerisa Bundó, Joan Manuel|||0000-0001-5771-8118, González Colás, Antonio María|||0000-0002-0009-0996
Tipo de recurso: artículo
Fecha de publicación:2016
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/90303
Acceso en línea:https://hdl.handle.net/2117/90303
https://dx.doi.org/10.1109/TC.2015.2493518
Access Level:acceso abierto
Palabra clave:Cache memory
Microprocessors
Cache memories
Parallel architectures
Distributed architectures
Clustered architectures
Store buffer
Memòria cau
Microprocessadors
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
Descripción
Sumario:Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.