Ayuda
Ir al contenido

Dialnet


Resumen de Understanding and reducing contention in generalized fat tree networks for high performance computing

German Rodríguez Herrera

  • The network infrastructure represents an important fraction of the complexity and performance of High Performance Computing (HPC) supercomputers. Network performance helps bridge the gap between the nominal performance, i.e., the aggregation of the performance of the individual processing elements, and the effective performance, i.e., the measured system performance for a given workload. The role of the network in a supercomputer cannot be underestimated: poor network performance can undermine global system performance, whereas using network resources optimally can, in turn, lower costs and power consumption. A large body of the research on improving network performance has been done basing their experimental analysis on a continuous injection of synthetic traffic. The assumption is that optimizing a very demanding, continuous and dynamic workload also optimizes the traffic of HPC applications. Such previous work resulted in a recommendation towards a random distribution of paths in the network to balance the load between the switches. Our study, based on the traffic of HPC applications on a production system, shows that such a random distribution of routes is not advisable, in general, as it greatly decreases network performance. Network cost and power consumption have also been addressed in the literature. Recent works have pointed out that popular network topologies such as k-ary n-trees are overdesigned for HPC traffic, suggesting the exploration of more cost-effective topologies (such as slimmed trees). We show that rather than overdesigned, k-ary n-tree topologies are actually underutilized. Even applications that would benefit from the \overdesigned" state, do not achieve their maximum performance due to inappropriate routing schemes. Still, if the network cost is to be reduced and the performance expectation is lowered to a fraction of the one achievable with a full k-ary n-tree, we show that cost can be significantly cut, as HPC applications' performance drops in a step-wise manner with a reduction of the network hardware. There is a certain amount of contention which is inherent to the application's communication pattern and to the topology itself. This amount cannot be eliminated through routing. Separating between the inherent contention of the application/topology pair and the contention introduced by the routing scheme can turn the downside of the inherent contention into an advantage. The inherent, unavoidable, contention can be used to set a less restrictive optimization goal that allows to devise better pattern-aware and oblivious routing algorithms, and also to reduce the network cost by eliminating (or shutting down) unnecessary resources that will not increase performance. The main contributions of this thesis can be summarized as follows: i) A co-simulation infrastructure that allows the fast simulation of all the relevant details of a parallel application from the MPI level to a very detailed model of the network technology. ii) A deeper understanding of network contention that led to the proposal of an offline contention metric that we use for our new pattern-aware routing heuristics. iii) An analysis of the performance of HPC applications in full-bisection and slimmed fat-trees comparing the performance of oblivious against pattern-aware routing schemes. iv) A combinatorial analysis of two previously proposed oblivious routing schemes (D-mod-k and S-mod-k) whose performance is close to optimal for some application's communication patterns and extremely poor for some others. We identify and explain the reason of the poor performance for the pathological patterns. v) A generalization of the oblivious routing algorithms analyzed, that improves the performance for pathological patterns without harming the performance for favorable patterns.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus