list of research papers

  • Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models

    April 9 - 11, 2018 Playa Blanca, Lanzarote, Canary Islands

    The 21st International Conference on  Artificial Intelligence and Statistics (AISTATS 2018)

    (not yet available online)

    Authors: Hugh Salimbeni (, Imperial College), Stefanos Eleftheriadis (, James Hensman (

    Abstract: The natural gradient method has been used effectively in conjugate Gaussian process models, but the non-conjugate case has been largely unexplored. We examine how natural gradients can be used in non-conjugate stochastic settings, together with hyperparameter learning. We conclude that the natural gradient can significantly improve performance in terms of wall-clock time. For ill-conditioned posteriors, the benefit of the natural gradient method is especially pronounced, and we demonstrate a practical setting where ordinary gradients are unusable. We show how natural gradients can be computed efficiently and automatically in any parameterization, using automatic differentiation.

    Gaussian Processes, Natural Gradient, Optimisation, Probabilistic Modelling

  • Variational Fourier Features for Gaussian Processes

    Posted 04/01/2018 (No publication date yet)

    Journal of Machine Learning Research (JMLR)

    (not yet available online)

    Authors:  James Hensman & Nicolas Durrand (, Arno Solin (Aalto University)

    Abstract: This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matérn kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the data set, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2 ) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features.

    Fourier Features, Gaussian Processes, Probabilistic Modelling, Variational Inference

  • Distributed Lifelong Reinforcement Learning with Sub-Linear Regret

    17-19 December 2018, Miami Beach

    (To appear – not yet available online)

    IEEE Conference on Decision and Control (CDC), Miami Beach, Dec. 17-19, 2018 (To Appear)

    Authors: Julia El-Zini, Rasul Tutunov, Haitham Bou-Ammar (, and Ali Jadbabaie 

    Abstract: In this paper, we propose a distributed second- order method for lifelong reinforcement learning (LRL). Upon observing a new task, our algorithm scales state-of-the-art LRL by approximating the Newton direction up-to-any arbitrary precision ε > 0, while guaranteeing accurate solutions. We analyze the theoretical properties of this new method and derive, for the first time to the best of our knowledge, sublinear regret under this setting

    Distributed Optimisation, Learning to Learn, Lifelong Learning, Online Learning, Reinforcement Learning

  • Scalable Lifelong Reinforcement Learning

    December 2017

    Journal of Pattern Recognition, Volume, 72, Dec. 2017 

    Authors: Yusen Zhan, Haitham Bou-Ammar (, and Matthew E. Taylor 

    Abstract: Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a linear convergence rate compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.

    Data Efficiency, Distributed Optimisation, Learning to Learn, Lifelong Learning, Multitask Learning, Online Learning, Reinforcement Learning, Scalability

  • Correctness-by-Learning of Infinite-State Component-Based Systems

    10-13 October 2017, Braga, Portugal

    International Conference on Formal Aspects of Component Software, Braga, Portugal, 10-13 Oct. 2017 

    Authors: Haitham Bou-Ammar (, Mohamad Jaber, and Mohammad Nassar 

    Abstract: We introduce a novel framework for runtime enforcement of safe executions in component-based systems with multi-party interactions modelled using BIP. Our technique frames runtime enforcement as a sequential decision-making problem and presents two alternatives for learning optimal strategies that ensure fairness between correct traces. We target both finite and infinite state-spaces. In the finite case, we guarantee that the system avoids bad-states by casting the learning process as a one of determining a fixed point solution that converges to the optimal strategy. Though successful, this technique fails to generalize to the infinite case due to the need for building a dictionary, which quantifies the performance of each state-interaction pair. As such, we further contribute by generalizing our framework to support the infinite setting. Here, we adapt ideas from function approximators and machine learning to encode each state-interaction pairs’ performance. In essence, we autonomously learn to abstract similar performing states in a relevant continuous space through the usage of deep learning. We assess our method empirically by presenting a fully implemented tool, so-called RERL. Particularly, we use RERL to: 1) enforce deadlock freedom on a dining philosophers benchmark, and 2) allow for pair-wise synchronized robots to autonomously achieve consensus within a cooperative multi-agent setting.

    Autonomous Formal Verification Methods, Reinforcement Learning, Software Engineering

  • Non-convex Policy Search Using Variational Inequalities

    October 2017

    Journal of Neural Computation, Volume 29, Issue 10, MIT Press, Oct. 2017

    Authors: Yusen Zhan, Haitham Bou-Ammar (, and Matthew E. Taylor 

    Abstract: Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies.

    These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

    Non-Convexity, Optimisation, Reinforcement Learning

  • An Information-Theoretic On-Line Update Principle for Perception-Action Coupling

    24-28 September 2017, Vancouver, Canada

    (not yet available online)

    IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, Sep 24-28, 2017

    Authors: Zhen Peng, Tim Genewein, Felix Leibfried (, and Daniel Braun

    Abstract: Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow and formulate a constrained optimization problem that maximizes utility under limited information-processing capacity in the two channels. As a solution, we obtain an optimal perceptual channel and an optimal action channel that are coupled such that perceptual information is optimized with respect to downstream processing in the action module. The main novelty of this study is that we propose an online optimization procedure to find bounded-optimal perception and action channels in parameterized serial perception-action systems. In particular, we implement the perceptual channel as a multi-layer neural network and the action channel as a multinomial distribution. We illustrate our method in a NAO robot simulator with a simplified cup lifting task.

    Bounded-Rationality, Information Theory, Reinforcement Learning

  • A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

    6-11 August, Sydney, Australia

    Workshop on Principled Approaches to Deep Learning at the International Conference on Machine Learning, Sydney, Australia, Aug 6-11, 2017

    Authors: Felix Leibfried (, Nate Kushman, and Katja Hofmann

    Abstract: Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient but need to acquire explicit knowledge about the environment. In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.

    ATARI Domain, Deep learning, Reinforcement Learning

  • Efficiently detecting switches against non-stationary opponents

    July 2017

    Journal of Autonomous Agent Multi-Agent Systems (July 2017) 31: 767.

    Authors: Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, L. Enrique Sucar · Enrique Munoz de Cote (

    Abstract: Interactions in multiagent systems are generally more complicated than single-agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real-world scenarios where agents have limited capacities and may deviate from a perfectly rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

    Game Theory, Learning, Multiagent Systems, Non-Stationary environments, Repeated Games, Switching strategies

  • Differential evolution strategies for large-scale energy resource management in smart grids

    15-19 July 2017

    Proceedings of the Genetic and Evolutionary Computation Conference, July 15-19 2017

    Authors: Fernando Lezama, Enrique Sucar, Joao Soares, Zita Vale, Enrique Munoz de Cote (

    Abstract: Smart Grid (SG) technologies are leading the modifications of power grids worldwide. The Energy Resource Management (ERM) in SGs is a highly complex problem that needs to be efficiently addressed to maximize incomes while minimizing operational costs. Due to the nature of the problem, which includes mixed-integer variables and non-linear constraints, Evolutionary Algorithms (EA) are considered a good tool to find optimal and near-optimal solutions to large-scale problems. In this paper, we analyze the application of Differential Evolution (DE) to solve the large-scale ERM problem in SGs through extensive experimentation on a case study using a 33-Bus power network with high penetration of Distributed Energy Resources (DER) and Electric Vehicles (EVs), as well as advanced features such as energy stock exchanges and Demand Response (DR) programs. We analyze the impact of DE parameter setting on four state-of-the-art DE strategies. Moreover, DE strategies are compared with other well-known EAs and a deterministic approach based on MINLP. Results suggest that, even when DE strategies are very sensitive to the setting of their parameters, they can find better solutions than other EAs, and near-optimal solutions in acceptable times compared with an MINLP approach.

    Differential Evolution, Evolutionary Algorithms, Multiagent Systems

  • Distributed Newton for Network Flow Optimisation

    (No date yet – To Appear)

    (not yet available online)

    SIAM Journal on Optimisation

    Authors: Rasul Tutunov, Haitham Bou-Ammar (, and Ali Jadbabaie 

    Abstract: In this paper, we propose a new distributed second-order method for network flow optimization. Our algorithm exploits the symmetry and diagonal dominance property of the dual Hessian to determine the Newton direction in a distributed fashion up-to-any precision ε > 0. This is achieved by first introducing a novel distributed solver for systems of equations described by symmetric diagonally dominant matrices based on Chebyshev polynomials. Our solver is then used to compute the Newton direction leading to a novel algorithm exhibiting similar phases of convergence to the exact (i.e., centralized) Newton method. We rigorously analyze the theoretical properties of both the solver and the distributed Newton method. Finally, we provide empirical validation for the proposed technique on a variety of network topologies. 

    Data Efficiency, Graph Theory, Network Flow, Optimisation

Quick Links

  • AAAI
  • Abstraction
  • ATARI Domain
  • Autonomous Formal Verification Methods
  • Bayesian Optimisation
  • Bounded-Rationality
  • Data Efficiency
  • Deep Learning
  • Differential Evolution
  • Distributed Optimisation
  • Game Theory
  • Gaussian Processes
  • Graph Theory
  • High-Dimensional Representation
  • Information Theory
  • Learning to Learn
  • Lifelong Learning
  • Model Learning
  • Multiagent systems
  • Multitask Learning
  • Network Flow
  • NIPS
  • Non-Stationary Environments
  • Non-Convexity
  • Online Learning
  • Optimisation
  • Probabilistic Modelling
  • Reinforcement Learning
  • Repeated Games
  • Software Engineering

join our team