Nowadays, the huge amount of available video content demands the creation of automatic systems for its understanding. In particular, human event recognition has become a relevant research area motivated by the variety of promising applications in the private and public sectors. In this context, system design is a challenging task as many issues arise related with the structure and performance in specific scenarios. For improving current systems, the Cognitive Computer Vision paradigm was recently proposed to study the relation of the system with its environment, its results and the available resources. However, its use in video analysis presents limited success in real scenarios. This thesis addresses the use of semantics (high-level knowledge representations) and feedback processing schemes for human event recognition in video content.
The first part starts by modeling the high-level knowledge related to video events in terms of the application domain and the analysis system. This model suits the needs of many applications being not restricted to any specific implementation. Then, its practical use for guiding video analysis is explored in two situations: automatic workfiow composition and human-related event recognition. The former is focused on the automatic selection and ordering of the most appropriate tools among the available ones in the system for the analysis of a specific domain.
The latter is concentrated on defining adequate structures for video event recognition considering temporal information and the uncertainty of the low-level analysis. Experiments on real datasets demonstrate the eficiency of the two described practical use cases.
In the second part, a generic feedback processing scheme is proposed to allow a variable analysis efiort according to the input data complexity and the output quality estimation. Then, it is applied to the processing stages of a traditional video event recognition system. Compared to the traditional (sequential) system, the use of feedback increases the precision and highly reduces the computational cost. Later on, this thesis focuses on a critical feedback-related issue: output quality estimation without ground-truth data. The foreground segmentation and object tracking stages are studied by providing taxonomies for current literature and by comparing the most representative approaches. Results show that different approaches should be used to detect specific performance characteristics. Finally, a novel approach is presented for object tracking evaluation without ground-truth in which its adaptive capability, that bounds the computational cost, makes it usable for long sequences where the tracking algorithm is expected to fail
© 2001-2026 Fundación Dialnet · Todos los derechos reservados