Parallel scalability of face detection in heterogeneous multithreaded architectures

David Oro García

Ayuda

Parallel scalability of face detection in heterogeneous multithreaded architectures

Autores: David Oro García
Directores de la Tesis: Francisco Javier Hernando Pericas (dir. tes.), Xavier Martorell Bofill (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2020
Idioma: español
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Inteligencia artificial
    - Lenguajes de programación
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Sistemas en tiempo real
Texto completo no disponible (Saber más ...)
Resumen
- Recently, facial recognition systems have become extremely popular and deployments of this technology are now ubiquitous. Applications ranging from access control to automated surveillance of video feeds rely on facial recognition for precisely identifying persons at multiple locations. Modern facial recognition software targeting surveillance applications typically needs to analyze video streams in order to identify faces in crowds in real time. The first analytical step to be conducted in facial recognition systems is face detection, which mainly involves determining the precise coordinates and dimensions of all faces appearing on a given image or video frame, and constitutes the first major bottleneck in the pipeline. As opposed to other use cases such as image classification that usually work flawlessly with VGA images, surveillance applications require working with high or ultra high definition resolutions in order to be able to locate and correctly identify people in crowds. Consequently, in order to maximize the chances of obtaining facial mugshots with enough quality and pixel densities to enable accurate facial identification, it is a must to be able to develop algorithms and heuristics that are capable of working with big images. The main challenge is to perform all required computations involved in just a few milliseconds to avoid the slowdown of all subsequent stages of the facial recognition pipeline. In this thesis, we study several low-level parallelization techniques and kernels that efficiently solve the problem of face detection in a scalable manner over multithreaded data-parallel GPU architectures. The first part of the thesis covers a multilevel mechanism that exploits both coarse-grained and fine-grained parallelism in combination with a smart usage of local on-die memories to reduce GPU underutilization when evaluating boosted cascades of ensembles over high-definition videos. We demonstrate that our proposed parallelization strategy solves the problem of GPU underutilization and achieves a 5X speed up when compared to methods relying on serialized kernel execution. The second part of the thesis presents a heuristic and a hybrid framework combining hand-crafted features with state-of-the-art convolutional neural networks to address the problem of real-time face detection in videos at ultra-high definition resolutions (4K and 8K). The obtained results prove that our proposed heuristic is capable of achieving real-time throughput over challenging video datasets when combining binarized hand-crafted features for discarding regions not containing faces with neural networks to further refine the underlying face detection process. The third part of the thesis presents a novel parallel non-maximum suppression (NMS) algorithm targeting the on-die GPU architectures included in modern SoCs. The contributed algorithm relies on a boolean matrix and parallel reductions to handle workloads featuring thousands of simultaneous detections on a given picture or video frame. Finally, we both formally and experimentally demonstrate that the execution time of our proposed parallel NMS algorithm linearly scales as the amount of GPU cores are increased.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: