Ayuda
Ir al contenido

Dialnet


Transitions in bayesian model selection problems: network-based recommender system and symbolic regression

  • Autores: Oscar Fajardo Fontiveros
  • Directores de la Tesis: Marta Sales Pardo (dir. tes.), Roger Guimerà Manrique (codir. tes.)
  • Lectura: En la Universitat Rovira i Virgili ( España ) en 2021
  • Idioma: español
  • Tribunal Calificador de la Tesis: Albert Díaz Guilera (presid.), Clara Granell Martorell (secret.), José Antonio Cuesta Ruiz (voc.)
  • Programa de doctorado: Programa de Doctorado en Nanociencia, Materiales e Ingeniería Química por la Universidad Rovira i Virgili
  • Materias:
  • Enlaces
    • Tesis en acceso abierto en: TDX
  • Resumen
    • Model selection problems consists in looking for the best model given a set of proposed models and the data. This is used by scientists every day when we try to find the explanations of the phenomena that happens around us. In the scientific method there are two steps that are critical, the observation and the hypothesis. It's natural to think that when a scientist tries to solve a model selection problem, he/she have to think in the data that he have collected and if he have a good idea, then take advantage of it to perform the model selection problem. But this world is not perfect, and our data has some systematic error and may be our hypothesis is wrong, so our model that we get is wrong too.

      In this thesis we want to study the interplay of the likelihood and the prior in the Bayesian inference in the case of model selection problem. The Bayes theorem has two important terms: the likelihood and the prior. The likelihood tell us how likely is our data given our model, and the prior is the information that we think a priori that is true. The prior is a probability distribution of models, that we choose given an hypothesis that we have, putting a high prior probability to these models that we think that are the correct one. If our prior is wrong, then we are going to fail in our predictions, and if it's right we are going to make better predictions.

      To study this interplay between the likelihood and the prior we are going to solve a couple of problems: the recommender system and the symbolic regression. The recommender system problem consists in from known user preferences we try to predict unobserved ones. Here our data are ratings that user give to items. In this problem we want to analyze how extra information of the users and items (gender, type of item, nationality...) can affect to the inference procedure. Here, the hypothesis that we use was that similar users would rate similar ratings to similar items and vice-versa. So, the prior in this case will contain the information of the metadata. To make this study we used a generative model, the Mixed-Membership Stochastic Block Model, a Bayesian framework, and synthetic data to control correlation of the ratings with the metadata. We studied all the possible scenarios, where the data can be correlated to the metadata and see how it can affect to the accuracy. In fact, when metadata is full correlated with the data, the best option is to use the metadata. If there is no correlation, the metadata would made the prediction worse. But if there is a high correlation, using both, metadata and data, would get the best performance.

      The last problem that we studied was the symbolic regression. This problem consist in to find the best model through the space of mathematical closed-form expressions. This model has to fit the data and also not be very complex. Here we want to study when, given a dataset with noise, we can detect the true model or not. We use the Bayesian machine scientist that uses a Bayesian formulation. This procedure use as a prior the corpus of the Wikipedia and looks for models with similar attributes than known models to avoid choose complex expressions. We used this procedure in synthetic data where we control the noise of our data and we already know the true models, so we can know if we are wrong or not. What we get is that for low noise levels, the algorithm can identify models with similar complexity, but for higher noises levels the algorithm proposes simpler models because fits better with the noise.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno