Semantic segmentation for real-world applications

Iñigo Alonso Ruiz

Ayuda

Semantic segmentation for real-world applications

Autores: Iñigo Alonso Ruiz
Directores de la Tesis: Luis Montesano del Campo (dir. tes.), Ana Cristina Murillo Arnal (dir. tes.)
Lectura: En la Universidad de Zaragoza ( España ) en 2021
Idioma: español
Tribunal Calificador de la Tesis: Antonio M. López Peña (presid.), Javier Civera Sancho (secret.), Jana Kosecka (voc.)
Programa de doctorado: Programa de Doctorado en Ingeniería de Sistemas e Informática por la Universidad de Zaragoza
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Sistemas de información, diseño y componentes
Enlaces
- Tesis en acceso abierto en: Zaguán
Resumen
- In computer vision, scene understanding aims at extracting useful information of a scene from raw sensor data. For instance, it can classify the whole image into a particular category (i.e. kitchen or living room) or identify important elements within it (i.e., bottles, cups on a table or surfaces). In this general context, semantic segmentation provides a semantic label to every single element of the raw data, e.g., to all image pixels or to all point cloud points.
  
  This information is essential for many applications relying on computer vision, such as AR, driving, medical or robotic applications. It provides computers with understanding about the environment needed to make autonomous decisions, or detailed information to people interacting with the intelligent systems.
  
  The current state of the art for semantic segmentation is led by supervised deep learning methods.
  
  However, real-world scenarios and conditions introduce several challenges and restrictions for the application of these semantic segmentation models. This thesis tackles several of these challenges, namely, 1) the limited amount of labeled data available for training deep learning models, 2) the time and computation restrictions present in real time applications and/or in systems with limited computational power, such as a mobile phone or an IoT node, and 3) the ability to perform semantic segmentation when dealing with sensors other than the standard RGB camera.
  
  The general contributions presented in this thesis are following:
  
  A novel approach to address the problem of limited annotated data to train semantic segmentation models from sparse annotations. Fully supervised deep learning models are leading the state-of-the-art, but we show how to train them by only using a few sparsely labeled pixels in the training images. Our approach obtains similar performance than models trained with fully-labeled images. We demonstrate the relevance of this technique in environmental monitoring scenarios, where it is very common to have sparse image labels provided by human experts, as well as in more general domains.
  
  Also dealing with limited training data, we propose a novel method for semi-supervised semantic segmentation, i.e., when there is only a small number of fully labeled images and a large set of unlabeled data. We demonstrate how contrastive learning can be applied to the semantic segmentation task and show its advantages, especially when the availability of labeled data is limited. Our approach improves state-of-the-art results, showing the potential of contrastive learning in this task. Learning from unlabeled data opens great opportunities for real-world scenarios since it is an economical solution.
  
  Novel efficient image semantic segmentation models. We develop semantic segmentation models that are efficient both in execution time, memory requirements, and computation requirements. Some of our models able to run in CPU at high speed rates with high accuracy. This is very important for real set-ups and applications since high-end GPUs are not always available. Building models that consume fewer resources, memory and time, would increase the range of applications that can benefit from them.
  
  Novel methods for semantic segmentation with non-RGB sensors.
  
  We propose a novel method for LiDAR point cloud segmentation that combines efficient learning operations both in 2D and 3D. It surpasses state-of-the-art segmentation performance at really fast rates. We also show how to improve the robustness of these models tackling the overfitting and domain adaptation problem.
  
  Besides, we show the first work for semantic segmentation with event-based cameras, coping with the lack of labeled data.
  
  To increase the impact of this contributions and ease their application in real-world settings, we have made available an open-source implementation of all proposed solutions to the scientific community.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: