# Learning non-linear dynamical systems with GPSSM

back to our blogsMuch of the time, humans take time itself for granted. We generally interpret the ongoing sequence of events that makes up our days in subjective, even unconscious, ways. We only pay close attention to time when we are planning and coordinating with others (e.g., Dinner at 8? –> traffic in the city centre? –> leave by 7). But what happens when we delegate such decisions to AI? Can an autonomous system automatically discover correlation and causation within time sequences and detect useful patterns that can drive decisions and actions? This is one of our main focuses at PROWLER.io: building probabilistic models that can better detect and make use of the patterns in time that underlie observed data. One family of models that we deem particularly promising for this area is Gaussian process state space models (GPSSM – see Frigola et al. 2014). Our recent NIPS paper on Identification of Gaussian Process State Space Models looks at how we can learn the GPSSM model using a deep recognition model. We’re excited to be able to share these findings with a wider audience.

### How can we model dynamics?

The simplest approach to modelling time dependencies is to define the problem as a recursive operation applied to available data. When we model the value of a particular asset over time, we can describe the relationship between the observed value at subsequent time steps using a transition function \( f \), so that \( \$(t) = f(\$(t-1)) \). The broader family of these models is known as autoregressive (AR) models. If we know the correct transition function \(f \), we can apply it to the current observation and any available action, and estimate the next step in the sequence. Repeating this procedure can then generate long-term predictions.

In finance, for example, we cannot assume that the value of an asset at time step \(t \) depends only on its previous value. Many other factors that don’t directly appear in the data can have huge impacts on asset price. Has the UK agreed with the EU on a smooth Brexit? What’s Trump tweeting about it? With machine learning, we try to model part of the unobserved information by inferring what we call latent variables or latent states that can then influence predictions.

The AR model that we describe above can be slightly modified in order to allow for these interactions. All we need to do is apply the transition function \( f \) to the latent states, assuming that we can recover the observed sequence from the latent state. This is the role of the measurement function \( g \), which relates latent states and observations. We write \( x_t = f( x_{t-1}, a_{t-1}) \) and \( y_t = g( x_t) \), where \( x_t, a_t \) are the latent state and action at time step \( t \), \( y_t \) is the associated observation and \(f, g \) are the transition and measurement functions, respectively. This general family of models is known as state-space models. We can make the observation depend on all past unobserved latent factors by just integrating them out. But how can we learn such a model? How should we select the transition and the measurement function? This is the task of system identification.

### Learning and inference in systems with linear dynamics

Part of the expressive power of state-space models comes from our choice for the transition function \( f \). Once the function is known, we can propagate the latent states (associated with past observations) through it and make more informative predictions. However, learning a state-space model is challenging mainly due to the fact that the transition function and the latent inputs/outputs to it are unknown (i.e., the classical supervised learning paradigm that we use for training AR models no longer applies). Additionally, many state-function pairs could provide the solution to our dynamical system, which makes the system unidentifiable. To recover the desired solution we need to carefully design the model and make the right assumptions.

First, we need to understand our data and dynamics. To use a simplified example, imagine observing the position of a car that can only move forwards and backwards on a track as a result of the kinetic force (action) applied to it. Can we predict its future position if we already know the future applied kinetic force? Of course we can. So we can design its state-space model with the following assumptions: 1) the observed car’s position can be regarded as the noisy observation \( y \) (not unlike the noisy reading from a GPS signal) and 2) the unobserved true state \( x \) of the car will be its velocity and acceleration. In this idealised system, we know that velocity and acceleration depend linearly on each other in subsequent time steps. We also know that position relates linearly to both velocity and acceleration. Consequently, we choose linear models for both the transition function \(f \) and the measurement function \(g \).

Now that we have chosen the model class, all that is left to do is to identify it, i.e. learn the mappings \(f \) and \(g \). The best parameters of these functions are ones that maximise the probability of observing a sequence of car positions after taking into account the car’s velocities and accelerations, i.e. the likelihood of the model parameters. In the linear case, we can learn these parameters analytically by iterating between the following two steps:

• Compute the current velocity and acceleration given all observed positions up to \(t-1 \), \(p( x_t | y_{1:t-1}) \) (integrate the previous latent state across the transition function).

• Compute the probability of all observed positions up to \(t \), \(p( y_{1:t}) \), (integrate all available latent states across the transition and measurement functions).

In the dynamical systems literature, the first step is known as the prediction step and the second as the filtering step. Once the system is known, inferring future states is as simple as repeating the two steps: 1) predicting the next latent state via the transition function; and 2) generating the future observation from the predicted latent state via the measurement function. In the following figure, we illustrate this process of fitting the training data and making predictions with a linear dynamical system for the aforementioned car example. Both the fitting and the predicted trajectory seem to be very good, which suggests that we have effectively identified the dynamical system.

### Non-linear dynamics and the GPSSM

What happens, though, in a more complicated scenario? Now imagine a freely swinging pendulum attached to the car, so that the pendulum swings around as the cart moves along the track. Should we expect that such a dynamical system can easily be recovered by our linear model? Unfortunately not, as we can see in our attempt below, the reason behind our poor predictions is that we cannot model the interplay between the pendulum and the car using a linear transition function.

Fortunately, there are many proven methods for the identification non-linear systems (Särkkä 2013). Here at PROWLER.io, we are very excited about GPSSMs, in which the transition function is a Gaussian process (GP). The use of the non-linear framework of GPs allows the GPSSM to recover more complicated and non-smooth dynamics. More importantly, as purely probabilistic models, GPs can effectively model the uncertainty in our current state and in the future predictions.

What’s the catch? The GPSSM, like any non-linear system, is intractable. We cannot analytically integrate the information from the latent factors across time, due to the non-linear transition function. How then can we identify the system? In machine learning, when we can’t compute things exactly, we approximate. So in the GPSSM we have to approximate the transition function \(f \) and the latent states \( x \). We approximate the true transition function with a sparse Gaussian process (see James’s blog post). This approximation eases optimisation by bringing several advantages to the GPSSM (e.g., scalability, stochastic optimisation etc.). We approximate the latent states with a linear chain, much as we did in the linear system. However, now we learn the parameters of this linear chain using a recurrent neural network (RNN) as a recognition model (Gershman and Goodman 2014).

We can now identify the system by minimising the difference – based on a probabilistic measure – between the true model and our approximations. Inference then is as easy as in the linear example: we first predict the next latent state via the approximate sparse GP and then generate an observation.

### Why all the excitement about the GPSSM?

Though GPSSM is complicated, it can be trained fairly easily with suggested approximations and will give consistent results. Once we have the knowledge and expertise to train the GPSSM, it can do pretty cool stuff:

• GPSSMs can model dynamics even from partially observed data. In the cart-pole example, the system can be fully described by four variables, i.e., cart position, cart velocity, pendulum angle and angular velocity. Here, we demonstrate that we can nicely predict future trajectories – even after observing only the cart’s position and the pendulum’s angle (GPSSM hasn’t seen any data related to velocities).

• GPSSMs can model the dynamics from misaligned input-output pairs. The equivalent example is having observations that associate the cart-pole state at time step \(t \) with the kinetic energy applied to the cart at time step \(t-lag \). We can recover the correlations in the GPSSM and produce reliable future trajectories, even with partially observed state, as seen here:

This is only the beginning of some very promising results. Though not in its final state, GPSSM is an ongoing project at PROWLER.io that involves active research on improving the model. We think that GPSSM will be an excellent model for planning and decision making in reinforcement learning settings for finance, logistics and robot control.

**References:**

*R. Frigola, Y. Chen and C. E. Rasmussen. “Variational Gaussian Process State-Space Models”, in Advances in Neural Information Processing Systems, 2014.*

*S. J. Gershman and N. D. Goodman. “Amortized Inference in Probabilistic Reasoning”, in Proceedings of the Annual Conference of the Cognitive Science Society, 2014.*

*S. Särkkä. “Bayesian Filtering and Smoothing”, Cambridge University Press, 2013.*