Ayuda
Ir al contenido

Dialnet


Resumen de Vision based sensor substitution in robotic assisted surgery

Arturo Marbán González

  • Perceiving and understanding the world represents a long-term goal in the field of Artificial Intelligence (AI). Advances in the field of Machine Learning (ML), and specifically in Deep Learning (DL), have led to the development of powerful models based on Deep Neural Networks (DNN) capable of interpreting high dimensional data, leading to higher performance in perception related tasks.

    DNNs such as Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks, greatly contribute to the state of the art in image recognition and in the processing of long sequences of data, respectively. CNNs excel in modeling data with spatial structure, while LSTM networks highlight in modeling data with temporal structure. They both represent the building blocks for modeling the spatio-temporal structure of data such as video sequences. Nonetheless, these models have not been exploited in the medical domain, where images and video sequences are frequently available, that is, in Minimally Invasive Surgery (MIS). Furthermore, most of the research going on rely on the design of DNNs used as classifiers rather than regressors.

    Recently, in the context of Unsupervised Learning (UL), Generative Adversarial Networks (GAN) have gained popularity as powerful generative models. GANs consist of two neural networks, a Generator (G) and a Discriminator (D). The task of D is to classify samples from ground-truth data and those rendered by G as real and fake data, respectively. On the other hand, the objective of G is to "fool" D, by learning to generate samples that resemble the ground-truth data. As the training process evolves, G learns the distribution of the real data. This framework is flexible and can be applied to different neural network architectures, such as in Convolutional Auto-Encoders (CAE), resulting in better image reconstruction quality.

    In this dissertation, a regression model based on DNNs is described, with applications in the context of Robot-Assisted Minimally Invasive Surgery (RAMIS). First, this model is developed in a Supervised Learning (SL) setting. Subsequently, it is extended to a Semi-Supervised Learning (SSL) approach by using a CAE and leveraging the advantages of the GAN framework. The regression model is designed to learn a complex relationship between video sequences and the evolution of continuous variables over time. The objective of this research is to perform Vision-Based Sensor Substitution (VBSS). Therefore, the DNN constitutes a "virtual sensor" that estimates the evolution of physical variables over time. The target applications are those where the only allowed sensor consists of a camera system and other electronic sensor devices are constrained.

    In the context of RAMIS, endowing robotic systems with force feedback capability represents a great help to provide the surgeon with an essential information for a better performance. The regression model designed for SFE is generic enough to be used in other domains with equivalent mathematical formulation. Therefore, it has also been studied and evaluated in the application of surgical instruments tracking. Specifically, in the estimation of the tool-tip position and velocity (in 3D space) from monocular video sequences. Such information is useful in tasks related to surgical gesture classification.

    The results of this dissertation suggest that the developed regression models, which are based on DNNs, can be used to address problems in which the estimation of continuous time-varying signals from video sequences is required.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus