Fetch unit design for scalable simultaneous multithreading (ScSMT)
Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threadi...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Estado: | Versión publicada |
| Fecha de publicación: | 2001 |
| País: | Argentina |
| Institución: | Universidad Nacional de La Plata |
| Repositorio: | SEDICI (UNLP) |
| Idioma: | inglés |
| OAI Identifier: | oai:sedici.unlp.edu.ar:10915/9404 |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/9404 |
| Access Level: | acceso abierto |
| Palabra clave: | Ciencias Informáticas Procesador paralelo Threads Arquitectura del procesador Informática |
| Sumario: | Continuous IC process enhancements make possible to integrate on a single chip the re-sources required for simultaneously executing multiple control flows or threads, exploiting different levels of thread-level parallelism: application-, function-, and loop-level. Scalable simultaneous multi-threading combines static and dynamic mechanisms to assemble a complexity-effective design that provides high instruction per cycle rates without sacrificing cycle time nor single-thread performance. This paper addresses the design of the fetch unit for a high-performance, scalable, simultaneous multithreaded processor. We present the detailed microarchitecture of a clustered and reconfigurable fetch unit based on an existing single-thread fetch unit. In order to minimize the occurrence of fetch hazards, the fetch unit dynamically adapts to the available thread-level parallelism and to the fetch characteristics of the active threads, working as a single shared unit or as two separate clusters. It combines static and dynamic methods in a complexity-efficient way. The design is supported by a simulation- based analysis of different instruction cache and branch target buffer configurations on the context of a multithreaded execution workload. Average reductions on the miss rates between 30% and 60% and peak reductions greater than 200% are obtained. |
|---|