In this thesis a methodology is proposed for reconstructing subjects and their deformations using general purpose 3D sensors and registration techniques. The main steps in many 3D computer vision systems include the acquisition, the alignment of the different views, and the final analysis. In order to provide the appropriate data to the analysis step, the acquisition has to obtain data which can be used by the alignment or registration method to estimate correctly the final model which will be analysed. In this thesis, the focus of the research is at providing a set of methods to improve the perception and the registration in adverse situations where the process is working in the limits of the sensitivity of the sensor. In order to do this, a review of the state of the art has been done to study the main proposals and the issues that still need to be solved. Low sensitivity of a camera compared to the required by a specific application produces, in some cases, situations where features cannot be distinctively perceived in the image. The registration methods make use of these characteristics to calculate the alignment. Then, if not reliable features are extracted in the acquisition, the method may not be able to properly align the data. These situations degenerate in presence of outliers, noise, missing data, etc. As these problems are not specific for a concrete application, in this thesis I propose a general methodology to overcome registration problems in adverse situations.
In the review of the state of the art, different kind of 3D sensors have been studied, and I have concluded that general purpose RGB-D sensors are very interesting to evaluate the thesis proposal. These sensors have a sensitivity which may be appropriate for a large number of applications but is not optimized for any specific one. Then, it is possible to evaluate the proposal by operating in the limits of sensitivity of these sensors. Moreover, they have been widely used in research works due to their characteristics of portability, affordability, data provided including colour and depth, etc. Furthermore, new applications that make use of these cameras can achieve a high social impact.
Two main sub-objectives are included in the proposed methodology: to develop and evaluate a method for rigid registration which overcomes the problem of low sensitivity using a model-based approach; and the development and evaluation of a non-rigid registration method that overcomes the low sensitivity problem incorporating multiple data spaces in the process.
The methodology for 3D registration has its formal expression in the Active Vision Model defined for registration tasks. This model instantiate the Active Vision Model defined initially in the thesis of Andrés Fuster-Guilló and continued by Jorge Azorín-López. In it, I define the elements to take into account in acquisition process and the transformations to improve the perception of the acquisition system in order to overcome registration problems.
In order to provide a rigid registration method which finely align 3D point sets, I proposed in this thesis a model-based method which uses 3D markers to overcome the problem of low sensitivity and noise. These markers are objects composed by planar faces with an easy geometry (e.g. cubes, pyramids, etc.). The models of the planes for each face of the marker are previously estimated to minimize the effects of the noise, outliers or missing data. In order to extract them, I have proposed the Multiplane Model Estimation (MME) method. This method makes use of prior knowledge (provided as constraints to the model estimation solution) of the object to accurately calculate the planes in which each face is modelled. This method has two steps. Initially, I propose the Point Cloud Clustering (PCC) method to estimate the points that belong to each face. The points are clustered using kMeans having points and normals as inputs. After that, the constraints are used to decide the clusters that define the faces of the marker. After the clustering, the model of the planes are calculated. For this goal, a variant of RANSAC called Multiple-Constraints RANSAC (MC-RANSAC) is proposed. Using the clustered points, an initial model composed by the normal and the centroid of each face is estimated. With the initial planar models of the faces, the constraints that define the object (angles between normals of each face in this case) are evaluated. If the planes fit the constraints, the inliers are evaluated to finally decide if the initial planes are accepted. The proposal is evaluated using synthetic and real data. Three objects have been evaluated, a cube, a pyramid and a double pyramid. The synthetic data, obtained with the Blensor plug-in for Blender, show for the PCC over 50\% of proper clustering of data, and in the case of real data the results show proper clustering evaluated using visual inspection. MC-RANSAC shows a maximum accumulative error of 4 degrees for all tested data. The method has been compared to a clustered RANSAC variant, providing better model estimation accuracy in most of cases.
After the models of the markers have been estimated, I propose a method for rigid registration. The method, coined as MUltiplane 3D MArker based Registration method (µ-MAR), makes use of the markers, concretely using cubes, to overcome noise and low sensitivity problems. This method aligns the planes that compose the markers simultaneously to eventually transform the whole environment reconstructing the subject of interest. The rotation is calculated using the normals, and the translation using the projection of centroids on the corresponding target planes. All this process is included in a multi-view framework which iteratively aligns a set of views. The proposal is evaluated using synthetic and real data. In the case of synthetic data, simple shapes have been evaluated quantitatively using Hausdorff distance and compared to ICP. The proposed method achieves a better alignment for all cases. To evaluate the method in real situations, a comparison with various method of the state of the art has been done, including ICP, RANSAC, KinectFusion, RGBDemo and my µ-MAR proposal. The proposed method obtains better registration accuracy for all cases evaluated by visual inspection.
To analyse the deformations of the subjects, a new non-rigid registration framework is presented in this thesis. In this framework, I propose the combination of multiple spaces (e.g. location, orientation, colour, topology...) in order to improve the perception of the necessary features to estimate the transformations. The use of various spaces reduces the uncertainty in those cases where the correspondences are not clear to determine. The proposal is based on the Coherent Point Drift (CPD), extending it to handle multiples inputs. A detailed study of the different combinations of colour and location information is presented. Concretely, the framework is instantiated in the registration of location data using colour and location information in the estimation of correspondences, called Colour Coherent Point Drift (CCPD).
Both real and synthetic data have been evaluated showing the good performance of the method in different situations. The synthetic includes both easy shapes and realistic ones. The first ones are a fish and a face, used in the original CPD, evaluating the method in presence of outliers, missing data and large deformations. The RMS error shows a higher accuracy in registration of the CCPD than the CPD. The realistic data, obtained using Blensor, shows two deformations of a flower and a face (smaller and larger deformations). The deformations of the face are elastic as the topology is preserved, whereas the flower deformations changes the shape such as a growth enlargement. Moreover, as an improvement of the input data, five downsampling techniques have been evaluated including bilinear interpolation, normal-based, colour-based, a combination of normal and colour-based, and a GNG sampling. The registration of CCPD outperforms the CPD registration in these tests as well. Finally, real experiments using a Primesense Carmine RGB-D sensor have been performed for three face expressions, showing by visual inspection an improvement of the registration accuracy for the proposal against the original version.
Finally, the conclusions are presented to summarize the main contributions of this thesis and proposing different future work areas, including the different parts involved in the thesis: acquisition, rigid registration and non-rigid registration. Moreover, the publications resulting from the research are presented.
© 2001-2026 Fundación Dialnet · Todos los derechos reservados