Ayuda
Ir al contenido

Dialnet


Representing functional data in reproducing Kernel Hilbert spaces with applications to clustering, classification and time series problems

  • Autores: Javier González Hernández
  • Directores de la Tesis: Alberto Muñoz García (dir. tes.)
  • Lectura: En la Universidad Carlos III de Madrid ( España ) en 2010
  • Idioma: inglés
  • Tribunal Calificador de la Tesis: Santiago Velilla Cerdán (presid.), Emilio Carrizosa Priego (secret.), María Dolores Ugarte Martínez (voc.), C. M. Cuadras (voc.), Wenceslao González Manteiga (voc.)
  • Materias:
  • Enlaces
  • Resumen
    • In modern data analysis areas such as Image Analysis, Chemometrics or Information Retrieval the raw data are often complex and their representation in Euclidean spaces is not straightforward.

      However most statistical data analysis techniques are designed to deal with points in Euclidean spaces and hence a representation of the data in some Euclidean coordinate system is always required as a previous step to apply multivariate analysis techniques. This process is crucial to guarantee the success of the data analysis methodologies and will be a core contribution of this thesis.

      In this work we will develop general data representation techniques in the framework of Functional Data Analysis (FDA) for classification and clustering problems. In Chapter 1 we motivate the problems to solve, describe the roadmap of the contributions and set up the notation of this work.

      In Chapter 2 we review some aspects concerning Reproducing Kernel Hilbert Spaces (RKHSs), Regularization Theory Integral Operators, Support Vector Machines and Kernel Combinations.

      In Chapter 3 we propose a new methodology to obtain finite-dimensional representations of functional data. The key idea is to consider each functional curve as a point in a general function space and then project these points onto a Reproducing Kernel Hilbert Space (RKHS) with the aid of Regularization theory. We will describe the projection methods, analyze its theoretical properties and develop an strategy to select appropriate RKHSs to represent the functional data.

      Following the functional data analysis approach, we develop in Chapter 4 a new procedure to deal with proximity (similarity or distance) matrices in classification problems by studying the connection between proximity measures and a certain class of integral operators. The idea is to come up with a methodology able to estimate an integral operator whose associated kernel function, evaluated at the sample, approximates the sample proximity matrix of the problem. To show the broad scope of application of the methodology, we will apply it to three cases: (1) classification problems where the only available information about the data is an asymmetric similarity matrix (2) partially labeled classification problems and (3) classification problems where several sources of information are available and can be combined to obtain the discrimination function.

      In Chapter 5 we propose an spectral framework for information fusion when the sources of information are given by a set of proximity matrices. Our approach is based on the simultaneous diagonalization of the original matrices of the problem and it represents a natural way to manage the redundant information involved in the fusion process. In particular, we define a new metric for proximity matrices and we propose a method that automatically eliminates the redundant information among a set of matrices when they are combined.

      We conclude the contributions of the thesis in Chapter 6 with a battery of simulated and real examples devoted to compare the performance of the proposed methodologies with the state of the art in representation methods. Finally, in Chapter 7 we include a discussion regarding the topics described above and we propose some future lines of research we believe are the natural extensions to the work developed in this thesis.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno