Resumen de Clustered multithreaded processors

Ayuda

Resumen de Clustered multithreaded processors

Fernando Latorre Salinas

Industry and researchers on academia are making a shift towards multi-core architectures, This shift is mainly motivated by two factors: on the one hand, we have reached a point where further exploiting instruction level parallelism (ILP) is giving us diminishing returns so that other types of parallelism are needed. On the other hand, new feature sizes allow a greater number of transistors to be implemented on a chip. This increase on the number of transistors opens the possibility of integrating multiple cores on die so that multiple applications could run in parallel getting good performance by exploiting thread level parallelism (TLP).

On-die multi-core architectures are very promising hardware designs completely aligned with the current efforts software programmers are doing to parallelize applications. Indeed, it is very likely that most of the workloads we run in the future will comprise multiple parallel threads. Therefore, making multi-core designs very efficient exploiting all thread level parallelism (TLP) available in these workloads is crucial in order to keep raising the processor performance in the future.

However, the number of parallel applications today is limited and some algorithms are difficult to parallelize. For this reason, it is likely that future processors will have to deal with certain amount of single threaded applications as well. Therefore, new processor designs should also effectively exploit ILP when TLP is limited and execute single threaded applications at a reasonable speed.

In conclusion, future architectures should be able to efficiently exploit both kinds of parallelism; ILP and TLP. Unfortunately, processors designed to exploit thread level parallelism are usually inefficient exploiting instruction level parallelism and vice versa. For instance, TLP is usually exploited implementing multiple cores on-die. However, the greater the number of cores we implement on a chip, the lower the complexity these cores may have in order to fit in a reasonable area and operate under a certain power budget. Thus, increasing the number of cores allows us to exploit more TLP, but since the complexity of every core is reduced, it loses its ability of exploiting ILP. On the other hand, cores that efficiently exploit ILP are usually power hungry and require significant space making unfeasible to implement many of them on the same die.

A possible alternative in order to implement architectures able to exploit both parallelisms, ILP and TLP, are adaptive architectures. Adaptive architectures adapt its hardware in order to better exploit the characteristics of the workloads being executed. Some novel designs suitable for adaptive architectures comprise simple components that interact to each other in order to behave as multiple small cores in case the number of threads to execute in parallel is abundant, or to build more complex cores where ILP should be exploited.

In this thesis we have proposed and evaluated several alternatives to design adaptive architectures able to exploit the kind of parallelism available on every application. Our proposals take advantage of two paradigms extremely useful to achieve this hardware adaptability: clustering and multithreading. On the one hand, clustering allows building power-efficient small hardware components that interact to each other in order to behave as a wide machine if needed. On the other hand, multithreading allows executing multiple threads in parallel.

The main contributions of this thesis are:

-A thorough evaluation of different memory hierarchies for clustered architectures.

-The proposal of a novel steering scheme for clustered architectures that overcomes the state-of-the-art.

-An alternative hardware design for the Reorder Buffer in order to allow a larger number of in-flight instructions. This increase in the number of in-flight instructions allows a better use of the vast number of hardware resources available in a clustered processor.

-A novel adaptive multi-core architecture comprising simple front-end and back-end components where clustering and multithreading paradigms are combined in order to better exploit the type of parallelism existing on the application. This work was extended with the proposal of a novel hardware scheme that allows multiple front-ends to steer instructions to the same back-ends without synchronizing their renaming stages.

-An effective resource assignment scheme for clustered simultaneous multithreading processors in order to take maximum advantage of the hardware resources available in a clustered architecture when multiple threads are executed in parallel.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Coordinado por: