Compiler and runtime support for the execution of scientific codes with unstructured datasets on heterogeneous parallel architectures

Pablo Barrio López Cortijo

Ayuda

Compiler and runtime support for the execution of scientific codes with unstructured datasets on heterogeneous parallel architectures

Autores: Pablo Barrio López Cortijo
Directores de la Tesis: Carlos Carreras Vaquer (dir. tes.)
Lectura: En la Universidad Politécnica de Madrid ( España ) en 2017
Idioma: español
Tribunal Calificador de la Tesis: Manuel de Hermenegildo Salinas (presid.), José Manuel Moya Fernández (secret.), Gabriel Caffarena Fernández (voc.), Christian Plessl (voc.), Francisco Javier Gómez Arribas (voc.)
Programa de doctorado: Programa de Doctorado en Ingeniería de Sistemas Electrónicos por la Universidad Politécnica de Madrid
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Informática
Enlaces
- Tesis en acceso abierto en: Archivo Digital UPM
Resumen
- Simulation codes based in the discretization of time and space for solving systems of partial differential equations are used nowadays in relevant industrial sectors. These include, for example, finite volumes and finite element methods typically used in Computational Fluid Dynamics (CFD). These simulation applications are widely used various industries such as aeronautics, automotive or weather prediction. More recently, they have been used in novel applications such as the simulation of blood streams for medical diagnosis, or the design of wind turbines for energy generation. The complexity of these simulations and the sheer size of the datasets used often require a considerable amount of computing power in order to obtain results in a reasonable time. In the last years, the optimization of these applications has been a priority for many companies, and even real time performance is now often considered as a stretch goal for optimization attempts.
  
  High Performance Computing (HPC) systems have been used for some time now to run these simulations, achieving reasonable speedups by partitioning the datasets and running the simulation in several processors. However, the need to run them in as little time as possible, the increase in the dataset sizes to allow for higher accuracy, and limitations in the multi-process scalability of CFD simulations, have resulted in joint worldwide efforts to achieve the ever-increasing performance and problem size objectives by means of newer, disruptive technologies.
  
  This thesis approaches the problem of optimizing the execution of these codes with the help of Heterogeneous HPC systems. These systems differ from standard, homogeneous HPC systems in that the architectures of the processing elements used as building blocks differ from each other. A special interest of this work is to analyze the feasibility of using Field-Programmable Gate Arrays (FPGAs) in these systems as accelerators to scientific simulations. Because these devices are essentially reconfigurable hardware, they allow a finer-grain parallelism than general-purpose processors, which translates into a higher throughput when computational kernels are ported to them. Additionally, FPGAs achieve levels of power efficiency that are currently unparalleled by any existing mainstream computing device. Unfortunately, the novelty of this approach implies that the programming effort required to implement such solutions is still high compared to other well-known accelerating technologies such as General-Purpose Graphics Processing Units (GPGPUs).
  
  FPGAs also show limitations with external memory throughput. Due to the characteristics of data dependencies in the datasets used for these codes, this problem might render FPGAs useless for the optimization of such applications. A deep optimization of the data transfer mechanisms is vital to ensure that the high-performance capabilities of these reconfigurable devices are not shadowed by their lower input/output capabilities as compared to other existing processing devices.
  
  The algorithms, methodologies and software libraries introduced throughout this Thesis improve data transfer of the aforementioned codes in FPGA-based parallel heterogeneous systems, while also reducing the development effort required to implement these. Specifically, we propose a methodology to reduce the size of the datasets and transfer them efficiently to the FPGA, as well as two compiler and runtime techniques to automate the parallelization of the codes suitable for heterogeneous systems, one focused on control flow distribution and another one based on pipelining of loop sequences. The latter are techniques at the system level, therefore they are independent from the architectures used in the heterogeneous system.
  
  Although the original purpose of this work was to optimize CFD simulations, it has become clear that the techniques proposed are applicable to a more general set of applications. In the future, HPC systems could use FPGAs in the same way that they now benefit from GPGPU tecnologies, but with the added benefits of reconfigurable hardware.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: