has eight research papers accepted at NeurIPS 2018

back to our news has eight research papers accepted at NeurIPS 2018 has eight research papers accepted at NeurIPS 2018, the leading Artificial Intelligence company is proud to announce that eight papers produced by its researchers have been accepted for presentation at NeurIPS 2018, The 32nd Annual Conference on Neural Information Processing Systems, in Montreal, Canada, 3-8, 2018. This year NeurIPS, which is the most prestigious machine learning event of its kind in the world, received a record-breaking 4856 submissions, of which 1011 were selected for acceptance. Three staff researchers have already received top reviewer awards. Tickets for NeurIPS sold out this year in less than 12 minutes of being put on sale.

Here you will find the abstracts of the papers researchers will present.

Distributed Multitask Reinforcement Learning with Quadratic Convergence

Authors: Rasul Tutunov (, Dongho Kim (, Haitham Bou-Ammar (

Abstract: Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large. The main reason behind this drawback is the reliance on centeralised solutions. Recent methods exploited the connection between MTRL and general consensus to propose scalable solutions. These methods, however, suffer from two drawbacks. First, they rely on predefined objectives, and, second, exhibit linear convergence guarantees. In this paper, we improve over state-of-the-art by deriving multitask reinforcement learning from a variational inference perspective. We then propose a novel distributed solver for MTRL with quadratic convergence guarantees.

Gaussian Process Conditional Density Estimation

Authors: Vincent Dutordoir (, Hugh Salimbeni ( and Imperial College London), Marc Deisenroth ( and Imperial College London), James Hensman (

Abstract: Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference. Our approach can be used to model densities even in sparse data regions, and allows for sharing learned structure between conditions. We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images.

Learning Invariances using the Marginal Likelihood

Authors: Mark van der Wilk (, Matthias Bauer (University of Cambridge and the Max Planck institute in Tübingen), ST John (, James Hensman (

Abstract: Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations on the input that are known to be irrelevant (e.g. translation). Commonly, this is done through data augmentation, where the training set is enlarged by applying hand-crafted transformations to the inputs. We argue that invariances should instead be incorporated in the model structure, and learned using the marginal likelihood, which correctly rewards the reduced complexity of invariant models. We demonstrate this for Gaussian process models, due to the ease with which their marginal likelihood can be estimated. Our main contribution is a variational inference scheme for Gaussian processes containing invariances described by a sampling procedure. We learn the sampling procedure by back-propagating through it to maximise the marginal likelihood.

Bandit Learning in Concave N-Person Games

Authors: Mario Bravo (Universidad de Santiago de Chile), Panayotis Mertikopoulos (Univ. Grenoble Alpes), David Leslie (

Abstract: This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents’ most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players’ behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Orthogonally Decoupled Variational Gaussian Processes

Authors: Hugh Salimbeni ( and Imperial College London), Ching-An Cheng (Georgia Institute of Technology), Byron Boots (Georgia Institute of Technology), Marc Deisenroth ( and Imperial College London)

Abstract: Gaussian processes provide a powerful non-parametric framework for reasoning over functions. Despite appealing theories, its superlinear computational and memory complexities have presented a long-standing challenge. The state-of-the-art methods of sparse variational inference trade modeling accuracy with complexity. However, their complexities still scale superlinearly in the number of basis functions, so they can learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed to remove the unnecessary coupling between the complexities of modeling the mean and the covariance functions. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.

Infinite Horizon Gaussian Processes

Authors: Arno Solin (Aalto Yliopisto), Richard Turner (University of Cambridge) and James Hensman (

Abstract: Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets. State space modelling (Kalman filtering) enables these non-parametric models to be deployed on long datasets by reducing the complexity to linear in the number of data points. The complexity is still cubic in the state dimension m. In certain special cases (Gaussian likelihood, regular spacing) the GP posterior will reach a steady posterior state when the data is very long. We leverage this and formulate an inference scheme for GPs with general likelihoods, where inference is based on single-sweep EP (assumed density filtering). The infinite-horizon model tackles the cubic cost in the state dimensionality and reduces the cost in the state dimension m to O(m2) per data point. The model is extended to online-learning of hyperparameters. We show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100 Hz Infinite Horizon Gaussian Processes.

A Bayesian Approach to Generative Adversarial Imitation Learning (Spotlight)

Authors: Wonseok Jeon (KAIST), Seokin Seo (KAIST), and Kee-Eung Kim (KAIST &

Abstract: Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks. This paradigm is based on reducing the imitation learning problem to the density matching problem, where the agent iteratively refines the policy to match the empirical state-action visitation frequency of the expert demonstration. Although this approach has shown to robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation. To address this issue, we first propose a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural networks. Then, we show that we can significantly enhance the sample efficiency of GAIL leveraging the predictive density of the cost, on an extensive set of imitation learning tasks with high-dimensional states and actions.

Monte-Carlo Tree Search for Constrained POMDPs

Authors: Jongmin Lee (KAIST), Geon-hyeong Kim (KAIST), Pascal Poupart (University of Waterloo), and Kee-Eung Kim (KAIST &

Abstract: Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

Join us to make AI that will change the world

join our team