Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors

Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access...

Descripción completa

Detalles Bibliográficos
Autores: Esteve García, Albert, Ros Bardisa, Alberto, Robles Martínez, Antonio, Duato Marín, José Francisco, Gómez Requena, María Engracia|||0000-0003-1466-4118
Tipo de recurso: artículo
Fecha de publicación:2016
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/81519
Acceso en línea:https://riunet.upv.es/handle/10251/81519
Access Level:acceso abierto
Palabra clave:Multiprocessor
Cache coherence
Directory cache
Coherence deactivation
TLB decay
ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES
Descripción
Sumario:Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far do not consider neither thread migration nor the private use of data within different application phases. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, we propose a TLB-based mechanism that is able to account for both thread migration and application phases. Simulation results show that the average number of pages detected as private significantly increases from 43 percent in previous proposals up to 79 percent in ours while keeping a reasonable TLB miss rate. Furthermore, when our proposal is used to deactivate the coherence for private data in a directory protocol, it improves execution time by 13.5 percent, on average, with respect to previous techniques.