Ayuda
Ir al contenido

Dialnet


High performance computing for genomics

  • Autores: Esteban Pérez Wohlfeil
  • Directores de la Tesis: Nicolas Guil Matas (dir. tes.), Eladio Damián Gutiérrez Carrasco (tut. tes.)
  • Lectura: En la Universidad de Málaga ( España ) en 2023
  • Idioma: inglés
  • Títulos paralelos:
    • Computación de alto rendimiento para la genómica
  • Tribunal Calificador de la Tesis: Oscar Plata González (presid.), José María Carazo García (secret.), Ulrich Bodenhofer (voc.)
  • Programa de doctorado: Programa de Doctorado en Tecnologías Informáticas por la Universidad de Málaga
  • Materias:
  • Enlaces
  • Resumen
    • With the thrive of new data acquisition methods, computerized research has become increasingly more common throughout the last decades. However, in order to match the huge data-processing demands, the design of new algorithms along with their optimization on specific hardware platforms (such as Graphic Processing Units) has become a necessity, especially given the costs of running and maintaining large supercomputing infrastructure, both in the case of dedicated and cloud computing servers. This scenario is particularly true in the case of comparative genomics, where massive DNA sequences are being published on a daily basis, and their processing presents a series of computational bottlenecks. As one of the principal use cases, researchers need to calculate alignments between pairs of sequences --which can be generalized as a multiple Longest Common Subsequence problem-- with the aim of determining structurally similar regions.

      On one hand, the comparison of DNA sequences is a central problem to many applications with direct impact on human health, and therefore its computational acceleration is of wide interest to both the scientific community and humanity as a whole. On the other hand, due to its arbitrary nature, the parallel acceleration of sequence comparison poses interesting computational challenges, including heterogeneous granularity, unpredictable load, balancing and synchronization mechanisms, etc. Therefore, in order to achieve high performance, algorithms must be tailored to the appropriate underlying hardware processing model, which may represent completely different computational approaches (e.g. transforming task parallelism into data parallelism) and often require even the complete redesign of the algorithms themselves.

      This thesis addresses a computational tour of the sequence comparison problem by making use of hardware and algorithmic optimizations in single core machines, shared memory systems and Graphic Processing Units. In particular, the first contribution features a statistical and mathematical framework that enables virtually unlimited search space size in sequence comparison due to its novel hashing methodology and strictly O(n) time complexity. The second contribution describes a parallel approach using shared memory machines to achieve high sensitivity in the comparison of noisy and partially incomplete metagenomic sequences. The third contribution describes the overcoming of the limitations of the data parallelism model in GPUs for the exhaustive and irregular pairwise sequence comparison. Lastly, a contribution which combines all of the previous knowledge is presented, featuring the use of Machine-Learning-aided schedulers to improve resource allocation and throughput in supercomputers dedicated to sequence comparison.

      The direct results of this thesis are twofold: from a computational perspective, new High Performance Computing methodologies, data structures and parallel mechanisms are proposed for a variety of hardware architectures; and from a comparative genomics perspective, the complexity of the sequence comparison problem has been effectively lowered while providing exhaustive and heuristic approaches that can be run on both commodity and specialized hardware.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno