Extending magny-cours cache coherence

One cost-effective way to meet the increasing demand for larger high-performance shared-memory servers is to build clusters with off-the-shelf processors connected with low-latency point-to-point interconnections like HyperTransport. Unfortunately, HyperTransport addressing limitations prevent build...

Descripción completa

Detalles Bibliográficos
Autores: Ros Bardisa, Alberto, Cuesta Sáez, Blas Antonio, Fernández-Pascual, Ricardo, Acacio Sánchez, Manuel E., Robles Martínez, Antonio, García Carrasco, José Manuel, Duato Marín, José Francisco, Gómez Requena, María Engracia|||0000-0003-1466-4118
Tipo de recurso: artículo
Fecha de publicación:2012
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/36257
Acceso en línea:https://riunet.upv.es/handle/10251/36257
Access Level:acceso abierto
Palabra clave:High-performance computing
Cache coherence
Coherence extension
Directory protocol
Scalability
Shared memory
Traffic filtering
ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES
Descripción
Sumario:One cost-effective way to meet the increasing demand for larger high-performance shared-memory servers is to build clusters with off-the-shelf processors connected with low-latency point-to-point interconnections like HyperTransport. Unfortunately, HyperTransport addressing limitations prevent building systems with more than eight nodes. While the recent High-Node Count HyperTransport specification overcomes this limitation, recently launched twelve-core Magny-Cours processors have already inherited it and provide only 3 bits to encode the pointers used by the directory cache which they include to increase the scalability of their coherence protocol. In this work, we propose and develop an external device to extend the coherence domain of Magny-Cours processors beyond the 8-node limit while maintaining the advantages provided by the directory cache. Evaluation results for systems with up to 32 nodes show that the performance offered by our solution scales with the number of nodes, enhancing the directory cache effectiveness by filtering additional messages. Particularly, we reduce execution time by 47 percent in a 32-die system with respect to the 8-die Magny-Cours configuration.