Ayuda
Ir al contenido

Dialnet


Value function estimation in optimal control via takagi-sugeno models and linear programming

  • Autores: Henry Díaz Iza
  • Directores de la Tesis: Antonio Sala (dir. tes.), Leopoldo Armesto Angel (dir. tes.)
  • Lectura: En la Universitat Politècnica de València ( España ) en 2020
  • Idioma: español
  • Tribunal Calificador de la Tesis: Matilde Santos Peñas (presid.), Ángel Valera Fernández (secret.), Saso Blazic (voc.)
  • Programa de doctorado: Programa de Doctorado en Automática, Robótica e Informática Industrial por la Universitat Politècnica de València
  • Materias:
  • Enlaces
    • Tesis en acceso abierto en: RiuNet
  • Resumen
    • The present Thesis employs dynamic programming and reinforcement learning techniques in order to obtain optimal policies for controlling nonlinear systems with discrete and continuous states and actions. Initially, a review of the basic concepts of dynamic programming and reinforcement learning is carried out for systems with a finite number of states. After that, the extension of these techniques to systems with a large number of states or continuous state systems is analysed using approximation functions.

      The contributions of the Thesis are:

      -A combined identification/Q-function fitting methodology, which involves identification of a Takagi-Sugeno model, computation of (sub)optimal controllers from Linear Matrix Inequalities, and the subsequent data-based fitting of Q-function via monotonic optimisation.

      -A methodology for learning controllers using approximate dynamic programming via linear programming is presented. The methodology makes that ADP-LP approach can work in practical control applications with continuous state and input spaces. The proposed methodology estimates a lower bound and upper bound of the optimal value function through functional approximators. Guidelines are provided for data and regressor regularisation in order to obtain satisfactory results avoiding unbounded or ill-conditioned solutions.

      -A methodology of approximate dynamic programming via linear programming in order to obtain a better approximation of the optimal value function in a specific region of state space. The methodology proposes to gradually learn a policy using data available only in the exploration region. The exploration progressively increases the learning region until a converged policy is obtained.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno