# Five key papers from NeurIPS 2018

## Summarised by PROWLER.io Co-founder and CTO Dongho Kim

back to our blogsThis year, PROWLER.io had eight papers accepted at the NeurIPS 2018 machine learning research conference. This is a significant achievement based on our size and R&D spend as a company.

While US heavyweights like Google, Microsoft, IBM, Facebook and Intel may have had a greater number of papers accepted, we’re punching well above our weight - we were also the highest ranking company from outside the US.

In fact, if you look at it another way and compare the number of NeurIPS papers published for every $1 million of R&D spend, the figures speak for themselves:

In our first post following the NeurIPS 2018 event, here are five of eight the papers PROWLER.io is very proud to have worked on.

The lead authors of the three remaining NeurIPs papers will summarise the methods, findings and impact of their research in three further blog posts.

These papers include Learning Invariances using the Marginal Likelihood, which presents the deep learning approach of data augmentation in a probabilistic way; Gaussian Process Conditional Density Estimation, which brings together Gaussian processes and conditional generative modelling (and predicts the pick-up and drop off points for NYC's yellow cabs); and Distributed Multitask Reinforcement Learning with Quadratic Convergence, which provides a scalable and efficient solution to aid many reinforcement learning tasks.

### Orthogonally Decoupled Variational Gaussian Processes

**Authors:** Hugh Salimbeni (PROWLER.io and Imperial College London), Ching-An Cheng (Georgia Institute of Technology), Byron Boots (Georgia Institute of Technology), Marc Deisenroth (PROWLER.io and Imperial College London)

This paper (and the following paper on *Infinite Horizon Gaussian Processes*) focuses on speeding up Gaussian process models. This, in turn, affects a wide range of applications where Gaussian processes are used such as Bayesian optimisation, robust control, last-mile deliveries, financial asset management, energy demand predictions, and airport resource demand predictions, to name a few.

In Gaussian process models, one has to learn both the mean function and variance function in order to make predictions with variance. Usually, the two functions are coupled in the sense that they share the same constituent parts, or basis functions. Since the variance function is more computationally expensive, it can be advantageous to use fewer constituent parts for that function. We show how to do this in an orthogonal way, which makes the joint-optimisation of the two functions easier.

The paper results in practical speed-ups for Gaussian process models, that work well in practice and have an impact on a range of machine learning applications.

### Infinite Horizon Gaussian Processes

**Authors:** Arno Solin (Aalto University), James Hensman (PROWLER.io) and Richard Turner (University of Cambridge)

We made some algorithmic improvements to Gaussian processes, which makes some of these probabilistic models much faster to run. For Gaussian process models with a single input dimension (usually time), an efficient way to extract new information in the model is to use a time-ordered representation. This exploits the Markov (time-dependent) set up of the model, which is effective for long data sequences. We considered what would happen if we let the model run for a really long time - to the infinite horizon - and noticed that we could plug the solution in locally for an impressive computational advantage with only minor accuracy loss.

We also detailed how to extend the model to non-Gaussian data (e.g. count data or binary data), and how to adapt the hyper-parameters of the Gaussian process in an online way.

### A Bayesian Approach to Generative Adversarial Imitation Learning (Spotlight)

**Authors:** Wonseok Jeon (KAIST), Seokin Seo (KAIST), and Kee-Eung Kim (KAIST & PROWLER.io)

Recently, a generative adversarial training paradigm has shown promising results on learning generative distributions of high dimensional data. This learning paradigm has been applied to imitation learning where, for example, a robot is taught to imitate a human demonstration.

While this learning paradigm is appealing, the number of demonstrations required for successful imitation is still a limiting factor to the applicability of this family of algorithms. In this paper, the authors proposed a Bayesian approach to generative adversarial imitation learning, which can significantly enhance the sample efficiency. Simply put, we can teach AI to imitate much faster.

### Monte-Carlo Tree Search for Constrained POMDPs

**Authors:** Jongmin Lee (KAIST), Geon-hyeong Kim (KAIST), Pascal Poupart (University of Waterloo), and Kee-Eung Kim (KAIST & PROWLER.io)

Reinforcement learning is one of the core machine learning algorithms. This paper brings fundamental improvements to flexibility and applicability of such algorithms, which in turn could improve mission-critical decision making applications such as robotics, autonomous vehicles, stock trading, supply chain and logistics.

POMDP (Partially Observable Markov Decision Process) is a de facto mathematical framework for optimising sequential decisions under partial or uncertain observation. For example, when controlling robots equipped with noisy sensors.

While we can compute near-optimal decision plans using the POMDP framework, there are two aspects which limit its applicability to real-world problems: 1) POMDP is a model-based algorithm, which means we need an accurate environmental model prior to optimising decisions; 2) a vanilla POMDP model does not provide any guarantee on its performance. This limits its application to mission critical problems where we often need to satisfy certain constraints, such as health and safety protocols.

This paper addresses the above limitations by integrating a sampling approach (Monte-Carlo Tree Search) and a constrained optimisation counterpart to the vanilla POMDP model (Constrained POMDPs). This allows us to make decisions optimally and safely, while satisfying the given constraints, with no necessity of a priori environmental model.

Reinforcement learning is one of the core machine learning algorithms. This paper brings fundamental performance improvements to such algorithms, which in turn could improve constrained POMDP applications such as autonomous robots, machine vision and behavioural ecology.

### Bandit Learning in Concave N-Person Games

**Authors:** Mario Bravo (Universidad de Santiago de Chile), Panayotis Mertikopoulos (Univ. Grenoble Alpes), David Leslie (PROWLER.io)

Under the bandit learning scenario, a system tries to optimise its performance when the only information available is how well the system has performed in the past. In particular, no information is available about what might have been achieved had other actions been taken, or how specific adjustments would improve the performance. So, the system must experiment to find out what works well. The problem is well-understood when the set of actions the system could select is a finite set, and when only one learner is present.

In this paper, we consider bandit learning with the same minimal information available. However, the action sets are a continuum and there are multiple agents learning simultaneously.

This is a significantly harder challenge, but we developed a scheme which satisfies two important criteria. First, the learning algorithm of each agent is individually rational, in that it would make sense to use even in the absence of other learners. So, individuals do not even need to know they are playing a game. Second, the scheme is effective when there are multiple learners.

We developed theoretical performance guarantees demonstrating that the regret of an individual under our scheme is not much worse than if there were no other learners present.