Antonio Ruiz, Manuel Ujaldón Martínez
This work analyzes the most advanced features of the Kepler GPU by Nvidia, mainly dynamic parallelism for launching kernels internally from the GPU and thread scheduling via Hyper-Q. We illustrate several ways to exploit those features from a code which computes Zernike moments, using two different formulations: direct and iterative. This way, we compare how well they can deploy parallelism on the new generation of GPUs. The direct alternative tries to maximize parallelism, while the iterative one increases the operational intensity by reusing results coming from previous iterations. This has allowed us to increase the speed-up factor attained on Fermi architectures versus a code written in C and executed on a multicore CPU. We also succeed on identifying the critical workload which is required by a code to improve its execution on the new GPU platforms endowed with six more times computational cores, and quantify the overhead introduced by the new dynamic programming mechanisms in CUDA
© 2001-2024 Fundación Dialnet · Todos los derechos reservados