Spatio-temporal convolutional neural networks for video object detection

Cores Costa, Daniel

Por favor, use este identificador para citas ou ligazóns a este ítem:

http://hdl.handle.net/10347/29792

Ficheiros no ítem

Nome:	rep_2898.pdf
Tamaño:	36.17 Mb
Formato:	PDF

Metadatos do ítem

Título:	Spatio-temporal convolutional neural networks for video object detection
Autor/a:	Cores Costa, Daniel
Dirección/Titoría:	Brea Sánchez, Víctor Manuel Mucientes Molina, Manuel
Centro/Departamento:	Universidade de Santiago de Compostela. Escola de Doutoramento Internacional (EDIUS) Universidade de Santiago de Compostela. Programa de Doutoramento en Investigación en Tecnoloxías da Información
Palabras chave:	video object detection \| convolutional neural networks \|
Data:	2022
Resumo:	The object detection problem is composed of two main tasks, object localization and object classification. The detection precision in images has greatly improved with the use of Deep Learning techniques, especially with the adoption of Convolutional Neural Networks. However, object detection in videos presents new challenges such as motion blur, out-of-focus or object occlusions that deteriorate object features in some specific frames. Moreover, traditional object detectors do not exploit spatio-temporal information that can be crucial to address these new challenges, boosting the detection precision. Hence, new object detection frameworks specifically designed for videos are needed to replicate the same success achieved in the single image domain. The availability of spatio-temporal information unlocks the possibility of analyzing long- and short-term relations among detections at different time steps. This highly improves the object classification precision in deteriorated frames in which a single image object detector would not be able to provide the correct object category. We propose new methods to establish these relations and aggregate information from different frames, proving through experimentation that they improve single image baseline and previous video object detectors. In addition, we also explore the utility of spatio-temporal information to reduce the number of training examples, keeping a competitive detection precision. Thus, this approach makes it possible to apply our proposal in domains in which training data is scarce and, also, it generally reduces the annotation costs.
Data de Embargo:	2023-11-21
URI:	http://hdl.handle.net/10347/29792
Dereitos:	Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Coleccións

- Área de Enxeñaría e Arquitectura [362]

Compartir/Enviar a

Citas

Mostrar o rexistro completo do ítem

O ítem ten asociados os seguintes ficheiros de licenza:

Creative Commons

A licenza do ítem descríbese como
Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Recolectores:	Enlaces de interese:

Recolectores:	Enlaces de interese: