Our AI research

Recent papers by PROWLER.io staff and advisors

Our AI research

Recent papers by PROWLER.io staff and advisors

Three research teams. Three focus areas. 27 papers.
One goal: advancing Principled AI for decision-making.

How PROWLER.io works
Probabilistic Modeling icon

Probabilistic Modelling

See papers
Reinforcement Learning icon

Reinforcement Learning

See papers
Multi-agent Systems icon

Multi-agent Systems

See papers

Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials

Presented 20 February 2018, University of Manchester

Authors: S.T. John (PROWLER.io), Gábor Csányi (University of Cambridge)

The Journal of Physical Chemistry (citation: J. Phys. Chem. B 121, 48, 10934-10949)

Abstract: We introduce a computational framework that is able to describe general many-body coarse-grained (CG) interactions of molecules and use it to model the free energy surface of molecular liquids as a cluster expansion in terms of monomer, dimer, and trimer terms. The contributions to the free energy due to these terms are inferred from all-atom molecular dynamics (MD) data using Gaussian Approximation Potentials, a type of machine-learning model that employs Gaussian process regression. The resulting CG model is much more accurate than those possible using pair potentials. Though slower than the latter, our model can still be faster than all-atom simulations for solvent-free CG models commonly used in biomolecular simulations.

Gaussian Processes

High-Dimensional Representation

Probabilistic Modelling

DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding

Published October 2017

The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3190-3199

Authors: Dieu Linh Tran, Robert Walecki, Ognjen (Oggi) Rudovic, Stefanos Eleftheriadis, Bjorn Schuller, Maja Pantic;

Abstract

Human face exhibits an inherent hierarchy in its representations (i.e., holistic facial expressions can be encoded via a set of facial action units (AUs) and their intensity). Variational (deep) auto-encoders (VAE) have shown great results in unsupervised extraction of hierarchical latent representations from large amounts of image data, while being robust to noise and other undesired artifacts. Potentially, this makes VAEs a suitable approach for learning facial features for AU intensity estimation. Yet, most existing VAE-based methods apply classifiers learned separately from the encoded features. By contrast, the non-parametric (probabilistic) approaches, such as Gaussian Processes (GPs), typically outperform their parametric counterparts, but cannot deal easily with large amounts of data. To this end, we propose a novel VAE semi-parametric modeling framework, named DeepCoder, which combines the modeling power of parametric (convolutional) and non-parametric (ordinal GPs) VAEs, for joint learning of (1) latent representations at multiple levels in a task hierarchy, and (2) classification of multiple ordinal outputs. We show on benchmark datasets for AU intensity estimation that the proposed DeepCoder outperforms the state-of-the-art approaches, and related VAEs and deep learning models.

Deep Learning

Gaussian Processes

Probabilistic Modelling

Gaussian Process Domain Experts for Modeling of Facial Affect

Published 28 June 2017

IEEE Transactions on Image Processing

Authors: Stefanos Eleftheriadis (PROWLER.io), Ognjen Rudovic (MIT Media Lab), Marc Peter Deisenroth, (PROWLER.io, Imperial College London), Maja Pantic (Imperial College London).

Abstract: Most of existing models for facial behavior analysis rely on generic classifiers, which fail to generalize well to previously unseen data. This is because of inherent differences in source (training) and target (test) data, mainly caused by variation in subjects' facial morphology, camera views, and so on. All of these account for different contexts in which target and source data are recorded, and thus, may adversely affect the performance of the models learned solely from source data. In this paper, we exploit the notion of domain adaptation and propose a data efficient approach to adapt already learned classifiers to new unseen contexts. Specifically, we build upon the probabilistic framework of Gaussian processes (GPs), and introduce domain-specific GP experts (e.g., for each subject). The model adaptation is facilitated in a probabilistic fashion, by conditioning the target expert on the predictions from multiple source experts. We further exploit the predictive variance of each expert to define an optimal weighting during inference. We evaluate the proposed model on three publicly available data sets for multi-class (MultiPIE) and multi-label (DISFA, FERA2015) facial expression analysis by performing adaptation of two contextual factors: “where” (view) and “who” (subject). In our experiments, the proposed approach consistently outperforms: 1) both source and target classifiers, while using a small number of target examples during the adaptation and 2) related state-of-the-art approaches for supervised domain adaptation.

Data Efficiency

Domain Adaptation

Gaussian Processes

Probabilistic Modelling

Distributed Lifelong Reinforcement Learning with Sub-Linear Regret

17-19 December 2018, Miami Beach

IEEE Conference on Decision and Control (CDC), Miami Beach, Dec. 17-19, 2018 (To Appear)

Authors: Julia El-Zini, Rasul Tutunov (PROWLER.io), Haitham Bou-Ammar (PROWLER.io), and Ali Jadbabaie

Abstract: In this paper, we propose a distributed second- order method for lifelong reinforcement learning (LRL). Upon observing a new task, our algorithm scales state-of-the-art LRL by approximating the Newton direction up-to-any arbitrary precision ε > 0, while guaranteeing accurate solutions. We analyze the theoretical properties of this new method and derive, for the first time to the best of our knowledge, sublinear regret under this setting

Distributed Optimisation

Learning to Learn

Lifelong Learning

Online Learning

Reinforcement Learning

Scalable Lifelong Reinforcement Learning

Published December 2017

Journal of Pattern Recognition, Volume, 72, Dec. 2017 

Authors: Yusen Zhan, Haitham Bou-Ammar (PROWLER.io), and Matthew E. Taylor 

Abstract: Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a linear convergence rate compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.

Data Efficiency

Distributed Optimisation

Learning to Learn

Lifelong Learning

Multitask Learning

Online Learning

Reinforcement Learning

Scalability

Correctness-by-Learning of Infinite-State Component-Based Systems

10-13 October 2017, Braga, Portugal

International Conference on Formal Aspects of Component Software, Braga, Portugal, 10-13 Oct. 2017 

Authors: Haitham Bou-Ammar (PROWLER.io), Mohamad Jaber, and Mohammad Nassar 

Abstract: We introduce a novel framework for runtime enforcement of safe executions in component-based systems with multi-party interactions modelled using BIP. Our technique frames runtime enforcement as a sequential decision-making problem and presents two alternatives for learning optimal strategies that ensure fairness between correct traces. We target both finite and infinite state-spaces. In the finite case, we guarantee that the system avoids bad-states by casting the learning process as a one of determining a fixed point solution that converges to the optimal strategy. Though successful, this technique fails to generalize to the infinite case due to the need for building a dictionary, which quantifies the performance of each state-interaction pair. As such, we further contribute by generalizing our framework to support the infinite setting. Here, we adapt ideas from function approximators and machine learning to encode each state-interaction pairs’ performance. In essence, we autonomously learn to abstract similar performing states in a relevant continuous space through the usage of deep learning. We assess our method empirically by presenting a fully implemented tool, so-called RERL. Particularly, we use RERL to: 1) enforce deadlock freedom on a dining philosophers benchmark, and 2) allow for pair-wise synchronized robots to autonomously achieve consensus within a cooperative multi-agent setting.

Autonomous Formal Verification Methods

Reinforcement Learning

Software Engineering

Non-convex Policy Search Using Variational Inequalities

Published October 2017

Journal of Neural Computation, Volume 29, Issue 10, MIT Press, Oct. 2017

Authors: Yusen Zhan, Haitham Bou-Ammar (PROWLER.io), and Matthew E. Taylor 

Abstract: Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies.

These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

Non-Convexity

Optimisation

Reinforcement Learning

An Information-Theoretic On-Line Update Principle for Perception-Action Coupling

24-28 September 2017, Vancouver, Canada

IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, Sep 24-28, 2017

Authors: Zhen Peng, Tim Genewein, Felix Leibfried (PROWLER.io), and Daniel Braun

Abstract: Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow and formulate a constrained optimization problem that maximizes utility under limited information-processing capacity in the two channels. As a solution, we obtain an optimal perceptual channel and an optimal action channel that are coupled such that perceptual information is optimized with respect to downstream processing in the action module. The main novelty of this study is that we propose an online optimization procedure to find bounded-optimal perception and action channels in parameterized serial perception-action systems. In particular, we implement the perceptual channel as a multi-layer neural network and the action channel as a multinomial distribution. We illustrate our method in a NAO robot simulator with a simplified cup lifting task.

Bounded-Rationality

Information Theory

Reinforcement Learning

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

6-11 August, Sydney, Australia

Workshop on Principled Approaches to Deep Learning at the International Conference on Machine Learning, Sydney, Australia, Aug 6-11, 2017

Authors: Felix Leibfried (PROWLER.io), Nate Kushman, and Katja Hofmann

Abstract: Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient but need to acquire explicit knowledge about the environment. In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.

ATARI Domain

Deep Learning

Reinforcement Learning

Efficiently detecting switches against non-stationary opponents

Published July 2017

Journal of Autonomous Agent Multi-Agent Systems (July 2017) 31: 767.

Authors: Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, L. Enrique Sucar · Enrique Munoz de Cote (PROWLER.io)

Abstract: Interactions in multiagent systems are generally more complicated than single-agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real-world scenarios where agents have limited capacities and may deviate from a perfectly rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

Game Theory

Learning

Multi-agent Systems

Non-stationary Environments

Repeated Games

Switching Strategies

Differential evolution strategies for large-scale energy resource management in smart grids

15-19 July 2017

Proceedings of the Genetic and Evolutionary Computation Conference, July 15-19 2017

Authors: Fernando Lezama, Enrique Sucar, Joao Soares, Zita Vale, Enrique Munoz de Cote (PROWLER.io)

Abstract: Smart Grid (SG) technologies are leading the modifications of power grids worldwide. The Energy Resource Management (ERM) in SGs is a highly complex problem that needs to be efficiently addressed to maximize incomes while minimizing operational costs. Due to the nature of the problem, which includes mixed-integer variables and non-linear constraints, Evolutionary Algorithms (EA) are considered a good tool to find optimal and near-optimal solutions to large-scale problems. In this paper, we analyze the application of Differential Evolution (DE) to solve the large-scale ERM problem in SGs through extensive experimentation on a case study using a 33-Bus power network with high penetration of Distributed Energy Resources (DER) and Electric Vehicles (EVs), as well as advanced features such as energy stock exchanges and Demand Response (DR) programs. We analyze the impact of DE parameter setting on four state-of-the-art DE strategies. Moreover, DE strategies are compared with other well-known EAs and a deterministic approach based on MINLP. Results suggest that, even when DE strategies are very sensitive to the setting of their parameters, they can find better solutions than other EAs, and near-optimal solutions in acceptable times compared with an MINLP approach.

Differential Evolution

Evolutionary Algorithms

Multi-agent Systems

Distributed Newton for Network Flow Optimisation

(No date yet – To appear)

(not yet available online)

SIAM Journal on Optimisation

Authors: Rasul Tutunov, Haitham Bou-Ammar (PROWLER.io), and Ali Jadbabaie 

Abstract: In this paper, we propose a new distributed second-order method for network flow optimization. Our algorithm exploits the symmetry and diagonal dominance property of the dual Hessian to determine the Newton direction in a distributed fashion up-to-any precision ε > 0. This is achieved by first introducing a novel distributed solver for systems of equations described by symmetric diagonally dominant matrices based on Chebyshev polynomials. Our solver is then used to compute the Newton direction leading to a novel algorithm exhibiting similar phases of convergence to the exact (i.e., centralized) Newton method. We rigorously analyze the theoretical properties of both the solver and the distributed Newton method. Finally, we provide empirical validation for the proposed technique on a variety of network topologies.

Data Efficiency

Graph Theory

Network Flow

Optimisation

An Exploration Strategy Facing Non-Stationary Agents (JAAMAS Extended Abstract)

May 8 - 12, 2017 AAMAS, Sao Paulo, Brazil

The 16th International Conference on Autonomous Agents and Multiagent Systems, 2017

Authors: P. Hernandez-Leal, Y. Zhan, M. E. Taylor, L. E. Sucar, E. Munoz de Cote

Abstract: Multi-agent systems where the agents interact among themselves and with a stochastic environment can be formalized as stochastic games. We study a subclass of these games, named Markov potential games (MPGs), that appear often in economic and engineering applications when the agents share some common resource. We consider MPGs with continuous state-action variables, coupled constraints and nonconvex rewards. Previous analysis followed a variational approach that is only valid for very simple cases (convex rewards, invertible dynamics, and no coupled constraints); or considered deterministic dynamics and provided open-loop (OL) analysis, studying strategies that consist in predefined action sequences, which are not optimal for stochastic environments. We present a closed-loop (CL) analysis for MPGs and consider parametric policies that depend on the current state and where agents adapt to stochastic transitions. We provide easily verifiable, sufficient and necessary conditions for a stochastic game to be an MPG, even for complex parametric functions (e.g., deep neural networks); and show that a closed-loop Nash equilibrium (NE) can be found (or at least approximated) by solving a related optimal control problem (OCP). This is useful since solving an OCP---which is a single-objective problem---is usually much simpler than solving the original set of coupled OCPs that form the game---which is a multiobjective control problem. This is a considerable improvement over the previously standard approach for the CL analysis of MPGs, which gives no approximate solution if no NE belongs to the chosen parametric family, and which is practical only for simple parametric forms. We illustrate the theoretical contributions with an example by applying our approach to a noncooperative communications engineering game. We then solve the game with a deep reinforcement learning algorithm that learns policies that closely approximates an exact variational NE of the game.

Multi-agent Systems

Reinforcement Learning

Opponent Modelling

Game Theory

Book title: Applications for Future Internet (International Summit, AFI 2016)

May 25 - 28, 2016 - International Summit, AFI, Puebla, Mexico

Authors: Enrique Sucar (INAOE, Mexico), Oscar Mayora (CREATENET, Italy), Enrique Munoz de Cote (PROWLER.io, UK)

Abstract: This book constitutes the refereed proceedings of the International Summit on Applications for Future Internet, AFI 2016, held in Puebla, Mexico, in May 2016.

The 21 papers presented were carefully selected from 29 submissions and focus on the usage of Future Internet in the biological and health sciences as well as the increased application of IoT devices in fields like smart cities, health and agriculture.

Multi-agent Systems

IoT

Sensor Networks

Join us to make AI that will change the world

join our team