Nested filtering methods for bayesian inference in state space models

Sara Pérez Vieites

Ayuda

Nested filtering methods for bayesian inference in state space models

Autores: Sara Pérez Vieites
Directores de la Tesis: Joaquín Miguez Arenas (dir. tes.)
Lectura: En la Universidad Carlos III de Madrid ( España ) en 2022
Idioma: español
Tribunal Calificador de la Tesis: Víctor Elvira Arregui (presid.), Stefano Cabras (secret.), David Luengo García (voc.)
Programa de doctorado: Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan Carlos
Materias:
- Matemáticas
  - Probabilidad
  - Estadística
    - Teoría de la distribución y probabilidad
    - Técnicas de inferencia estadística
Enlaces
- Tesis en acceso abierto en: e-Archivo
Resumen
- A common feature to many problems in some of the most active fields of science is the calibration of dynamical models and their subsequent use to forecast the time evolution of high-dimensional dynamical systems using sequentially collected data. 'Calibration' may have different implications in different problems, but most often it refers to the estimation of a set of unknown, static parameters using real-world data. Of course, the processes of prediction and tracking, and parameter estimation are closely related. Typically, the same data are used for both tasks and, in problems where observations are collected sequentially and online, we would ideally like to have algorithms for joint parameter estimation, and tracking and prediction of dynamical variables.
  
  Many examples can be found in meteorology [7], oceanography [13] and climate modelling [9], where current models for global weather forecasting involve the tracking of millions of time-varying state variables. This problem is not constrained to geophysics, though. In biochemistry it is often necessary to estimate the evolution of populations of different types of reacting molecules, which usually involves the estimation of the parameters that govern the interaction between them as well [10]. A similar problem needs to be solved in ecology, forecasting the populations of prey and predator species as they interact [3,4]. In neuroscience we can also find problems that need state tracking and parameter estimation, such as the ones involving the FitzHugh–Nagumo model [11] (characterizing the functioning of an excitable system like a cell or a neuron) and the Hodgkin–Huxley model [5] (describing how action potentials in neurons are initiated and propagated). Additionally, we can find other similar examples in other fields such as quantitative finance and engineering. One of the most typical problems in finance is related to stochastic volatility models [1,12,17] that evaluate derivative securities such as option pricing, while target tracking [18] is a classical problem in engineering that has a wide range of applications such as surveillance, air traffic control, aerospace, robotics, remote sensing and computer vision.
  
  Traditionally, both inferential tasks have been addressed separately though, and there are few methods that calculate the full posterior probability distribution of all the unknown variables and parameters of the model. In the last years some methods that accomplish this task have been proposed. They are well-principled probabilistic methods that solve the joint problem numerically and supported by rigorous performance analyses [2,6,8,14,15]. From the viewpoint of Bayesian analysis, these conditional, or posterior, distributions contain all the information relevant for the estimation task. From them, one can compute point estimates of the parameters and states but also quantify the estimation error. Some examples are the sequential Monte Carlo square (SMC^2) [6], the particle Markov chain Monte Carlo (PMCMC) [2] and the nested particle filter (NPF) [8] methods. However, both SMC^2 and PMCMC are batch (non recursive) techniques. In other words, every time a new observation arrives, the whole sequence of observations may have to to be re-processed from scratch in order to update the estimates, leading to a quadratic increase of the computational effort over time. As an alternative, NPFs [8] apply the same principles as SMC^2 in a recursive way. In other words, every time a new observation arrives, the whole sequence of observations does not have to to be re-processed from scratch in order to update the estimates. Instead, the updated estimates are computed from the previous ones and the new observation alone. However, the use of two layers of intertwined sequential Monte Carlo (SMC) algorithms makes its computational cost prohibitive in high-dimensional problems.
  
  This algorithm was the starting point of the research of this thesis, since the first objective was to simplify the NPF in order to create more efficient algorithms. In particular, we have studied different ways of replacing the SMC schemes in the NPF. The first solution we have proposed is the replacement of the SMC modules of the second layer of the algorithm by a bank of Kalman filters. We have chosen Kalman filters because they are one of the simplest known filtering techniques and, in addition, we have modified the second layer of the nested scheme because the implementation in the state tracking layer is less complex. We specifically explore the combination of Monte Carlo and quasi–Monte Carlo approximations in the first layer, including SMC and sequential quasi-Monte Carlo (SQMC), with standard Gaussian filtering methods in the second layer, such as the ensemble Kalman filter (EnKF) and the extended Kalman filter (EKF). However, other algorithms can fit naturally within the framework. In particular, we have assessed, for the two-scale stochastic Lorenz 96 system, four algorithms: SMC-EnKF, SMC-EKF, SQMC-EnKF and SQMC-EKF. These two-layer schemes estimate the static unknown parameters as well as the slow dynamical state variables, while the contribution of the fast variables is replaced by a polynomial ansatz which has to be fitted as well. Within this framework, we see that the resulting algorithms outperform other methods such as the two-stage filter [16] and the NPF. The use of Gaussian filters in the second layer of the nested hybrid filters not only leads to a significant reduction in computational complexity compared to the NPF, but this is attained without a significant loss of accuracy. The selection of filtering techniques in each layer depends on the computational cost one can afford, obtaining better performance with the most computationally complex methods. Therefore, the proposed framework enables a trade-off between accuracy and computational cost.
  
  Additionally, we obtain theoretical guarantees on the convergence of, at least, the family of nested hybrid filters that employ Monte Carlo (or quasi-Monte Carlo) in the first layer of the nested scheme. In particular, we have proved that the approximate posterior distribution of the parameters output by the first layer of the nested filter converges to a certain limit distribution that depends on the algorithm used in the second layer. Note that this is not a guarantee of convergence towards the true posterior distribution of the parameters, but possibly towards a biased probability distribution. The bias of the latter depends on the choice of filters used in the second layer of the algorithm, being the resulting algorithm biased if the filters in the second layer are so (as it is the case in general with approximate Gaussian filters). However, it guarantees that a limit distribution does exist and the convergence rate is the classical Monte Carlo rate.
  
  The use of Kalman filters in the second layer of the method worked satisfactorily and this led us to the introduction of Kalman-based filters in the first layer of the nested schemes as well. Then, we have introduced a class of schemes that can incorporate deterministic sampling techniques (such as the cubature Kalman filter (CKF) or the unscented Kalman filter (UKF)) in the first layer of the algorithm, instead of the Monte Carlo-based methods employed in the original procedure. As all the methods used in this scheme are Gaussian, we refer to this class of algorithms as nested Gaussian filters. In particular, we implement a UKF in the first layer while there is a bank of EKFs in the second layer. Unfortunately, the use of non-Monte Carlo methods in the first layer leads to a non-straightforward problem. The key difficulty is to keep the algorithm recursive. The reason for this is that the jittering procedure (used both in the NPF and the nested hybrid filters cannot be employed anymore. The jittering step consists in drawing a new set of parameter particles at each discrete-time step even if the parameters are static. This is done with a Markov kernel, which either perturbs (jitters) a few particles with arbitrary variance (while leaving most of them unperturbed) or jitters all particles with a controlled variance that decreases as the number of samples increases. Without this step, the diversity of the values of the parameter particles decreases sharply after a few resampling steps, leading to poor approximations of the parameter posterior probability distributions.
  
  Jittering cannot be extended to Gaussian filters in a practical way. Instead, we have made the update of the filter in the outer layer dependent on a distance defined on the parameter space. When the distance between consecutive parameter estimates falls below a prescribed threshold the algorithm operates in a purely recursive manner. However, the selection of this threshold is not trivial and could vary from one problem to another. A poor choice of the threshold may lead to two possible scenarios. First, when we set a threshold with too small a value, the algorithm operates non-recursively more often than it should. This increases drastically the computational cost of the resulting method. Second, when the threshold is too high, the computational cost of the algorithm decreases in exchange for greater errors in the approximations of the probability density functions of the parameters and the state variables. We have carried out a specific study of the choice of the threshold and shown, as a result, that the nested Gaussian filters can be more efficient than the previous nested algorithms.
  
  For this algorithm, we present numerical results for a stochastic Lorenz 63 model using synthetic data, as well as for a stochastic volatility model with real-world data (namely, euro-to-USD exchange rates between 2014 and 2016). We have introduced and assessed the values of a relative threshold that enables the algorithm to work recursively, and we have evaluated the performance of the algorithm in terms of the normalized mean square errors for the parameters and the dynamic state variables. We have also compared these results with other algorithms, such as the EnKF or the UKF, that implement state augmentation (i.e., an extended state that includes both parameters and dynamical variables), an NPF, and also with a nested hybrid filter that incorporates a SMC scheme in the first layer and EKFs in the second layer. The introduction of Gaussian techniques in both the first and second layers of the algorithm entails another improvement in the performance of the nested methodology, since we obtain similar errors as the NPF and the nested hybrid filters (NHFs), but with a further reduction in the computational cost. Also, the accuracy of the nested Gaussian filters is considerably increased compared to Gaussian filters that implement state augmentation.
  
  An alternative way to think of the unknown parameters and the dynamic variables in a state space model is tracking it as two sets of state variables that evolve over different time scales: the unknown parameters are slow variables (so slow that we can work with them as if they were genuinely static) while dynamic variables are rapidly time-varying in comparison. Following this argument, it makes sense to extend the nested filtering methodology to general multi-scale state space models where different subsets of state variables evolve over different time scales. This realization has led to the multi-scale nested filters, which apply the nested hybrid filtering methodology to a class of heterogeneous multi-scale models by relating each relevant time scale to a layer of computation in the nested structure. In particular, we have described a three-layer nested smoother that approximates, in a recursive manner, the posterior probability distributions of the parameters and two sets of state variables (fast and slow ones) given the sequence of available observations. The computations on the second layer are conditional on the candidate parameter values generated on the first layer, while the calculations on the third layer are conditional on the candidates drawn at the first and second layers. As in the previous methodologies, the inference techniques used in each layer can vary, which leads to different algorithms.
  
  The two-scale stochastic Lorenz 96 system has also been used for assessment, although in this case we estimate the static unknown parameters as well as both the slow and fast dynamical state variables. We have studied the performance of two different algorithms with three layers of inference. To be specific, we describe two possible algorithms that derive from this scheme, combining Monte Carlo methods and Gaussian filters at different layers. The first method uses SMC methods in both first and second layers, together with a bank of UKFs in the third layer (i.e., a SMC-SMC-UKF algorithm). The second method employs a SMC in the first layer, EnKFs at the second layer and introduces the use of a bank of EKFs in the third layer (i.e., a SMC-EnKF-EKF algorithm). The computational cost increases considerably compared with any algorithm of two layers, but in exchange we obtain estimates for all the dynamical variables. The proposed implementations perform the inference task effectively but additional research is needed to optimise the algorithms and compare them with alternative techniques.
  
  All the methods and algorithms described in this thesis as well as the NPF, belong to the same class of methods. Therefore, this thesis describes a generalised nested methodology, structured in (two or more) intertwined layers, that comprises all of them. This is essentially a probabilistic methodology that aims at recursively computing the sequence of posterior probability distributions of the unknown model parameters and its (time-varying) state variables conditional on the available observations.
  
  References [1] Ömer Deniz Akyildiz and Joaquín Míguez. Nudging the particle filter. Statistics and Computing, 30(2):305-330, 2020.
  
  [2] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society B, 72:268-342, 2010.
  
  [3] David Barber and Yali Wang. Gaussian processes for bayesian estimation in ordinary differential equations. In International conference on machine learning, pages 1485-1493. PMLR, 2014.
  
  [4] Richard J Boys, Darren J Wilkinson, and Thomas BL Kirkwood. Bayesian inference for a discretely observed stochastic kinetic model. Statistics and Computing, 18(2):125–135, 2008.
  
  [5] Laure Buhry, Filippo Grassia, Audrey Giremus, Eric Grivel, Sylvie Renaud, and Sylvain Saïghi. Automated parameter estimation of the Hodgkin-Huxley model using the differential evolution algorithm: application to neuromimetic analog integrated circuits. Neural computation, 23(10):2599–2625,2011.
  
  [6] Nicolas Chopin, Pierre E Jacob, and Omiros Papaspiliopoulos. SMC^2: an efficient algorithm for sequential analysis of state space models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3):397–426, 2013.
  
  [7] A. M. Clayton, A. Lorenc, and D. M. Barker. Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office. Quarterly Journal of the Royal Meteorological Society, 139(675):1445–1461,2013.
  
  [8] Dan Crisan, Joaquín Míguez, et al. Nested particle filters for online parameter estimation in discrete-time state-space Markov models. Bernoulli,24(4A):3039–3086, 2018.
  
  [9] D. P. Dee, S. M. Uppala, A. J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae, M. A. Balmaseda, G. Balsamo, and P. Bauer. The ERA-interim reanalysis: Configuration and performance of the data assimilation system. Quarterly Journal of the royal meteorological society, 137(656):553–597,2011.
  
  [10] A. Golightly and D. J. Wilkinson. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus, 1(6):807–820, 2011.
  
  [11] Anders Chr Jensen, Susanne Ditlevsen, Mathieu Kessler, and Omiros Papaspiliopoulos. Markov chain Monte Carlo approach to parameter estimation in the Fitzhugh-Nagumo model. Physical Review E, 86(4):041114, 2012.
  
  [12] Sangjoon Kim, Neil Shephard, and Siddhartha Chib. Stochastic volatility: likelihood inference and comparison with ARCH models. The review of economic studies, 65(3):361–393, 1998.
  
  [13] P. J. Van Leeuwen. A variance-minimizing filter for large-scale applications. Monthly Weather Review, 131(9):2071–2084, 2003.
  
  [14] I. P. Mariño, A. Zaikin, and J. Míguez. A comparison of Monte Carlo-based Bayesian parameter estimation methods for stochastic models of genetic networks. PLOS ONE, 12(8):e0182015, 2017.
  
  [15] J. Míguez, I. P. Mariño, and M. A. Vázquez. Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in state-space models. Signal Processing, 142:281–291, January 2018.
  
  [16] Naratip Santitissadeekorn and Christopher Jones. Two-stage filtering for joint state-parameter estimation. Monthly Weather Review, 143(6):2028–2042, 2015.
  
  [17] Audrone Virbickaite, Hedibert F Lopes, M Concepción Ausín, and Pedro Galeano. Particle learning for Bayesian semi-parametric stochastic volatility model. Econometric Reviews, 2019.
  
  [18] Xuedong Wang, Tiancheng Li, Shudong Sun, and Juan M Corchado. A survey of recent advances in particle filters and remaining challenges for multitarget tracking. Sensors, 17(12):2707, 2017.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: