Ayuda
Ir al contenido

Dialnet


Machine learning with functional data: Methodological advances and computational tools

  • Autores: Carlos Ramos Carreño
  • Directores de la Tesis: Alberto Suárez González (dir. tes.), José Luis Torrecilla Noguerales (dir. tes.)
  • Lectura: En la Universidad Autónoma de Madrid ( España ) en 2023
  • Idioma: español
  • Número de páginas: 248
  • Tribunal Calificador de la Tesis: Jörg Fabian Dominik Scheipl (presid.), Carlos María Alaiz Gudín (secret.), Stanislav Nagy (voc.)
  • Programa de doctorado: Programa de Doctorado en Ingeniería Informática y de Telecomunicación por la Universidad Autónoma de Madrid
  • Materias:
  • Enlaces
  • Resumen
    • Functional data consist of observations that depend on a continuous parameter, such as time or space.

      These types of data appear in many problems of practical interest in economics, biology, medicine, and environmental sciences, among others.

      They present characteristics that are markedly different from multivariate data, which are the most prevalent object of study in statistics and machine learning.

      For these reasons, it is important to have at one's disposal tools that take into account and exploit the functional nature of the observations.

      The main goal of this thesis is to design statistical methods and computational tools for machine learning with functional data.

      A first set of methodological contributions have been made for dimensionality reduction, clustering, and classification.

      In particular, we derive optimal rules for the classification of Gaussian processes.

      Special attention is devoted to the singular case in which the processes are orthogonal and near-perfect classification (zero Bayes error) is obtained asymptotically.

      A second contribution consists in an exhaustive theoretical and empirical analysis of recursive maxima hunting (RMH), a filter method for variable selection that takes advantage of the functional nature of the data.

      In recursive maxima hunting, variables are selected iteratively.

      At each step one selects the variable whose dependence with the class label, measured using the distance covariance, is strongest.

      Then, the corresponding contribution is removed by subtracting from each functional observation the expectation of the underlying process conditioned to the value of the variable selected.

      Finally, the behavior of the clustering method fuzzy C-means has been analyzed when applied to functional data.

      In the second part of the thesis, a suite of computational tools for retrieval, representation, exploratory analysis, preprocessing, and machine learning for functional data is introduced.

      Specifically, the Python libraries scikit-datasets and rdata have been developed to handle multivariate and functional datasets.

      These packages facilitate the retrieval of these data from a variety of sources, their conversion to a unified format, and the empirical evaluation of machine learning methods that utilize them.

      A prominent contribution of this thesis is the development of the library scikit-fda, a Python package for Functional Data Analysis (FDA).

      It provides a comprehensive set of tools for statistical analysis and machine learning with functional data.

      The library is built upon and integrated in the scientific Python ecosystem.

      In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, hyperparameter tuning, and model selection, among others.

      Finally, the dcor package is an additional contribution of this thesis.

      This package provides tools to compute dependency measures, such as the aforementioned distance covariance, as well as related tests of homogeneity and independence.

      The computational tools developed as part of this thesis have been released as free open-source software, and are open to contributions from the scientific community.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno