Our AI research

Recent papers by PROWLER.io staff and advisors

Our AI research

Recent papers by PROWLER.io staff and advisors

Three research teams. Three focus areas. 38 papers.
One goal: advancing Principled AI for decision-making.

How PROWLER.io works
Probabilistic Modeling icon

Probabilistic Modelling

See papers
Reinforcement Learning icon

Reinforcement Learning

See papers
Multi-agent Systems icon

Multi-agent Systems

See papers

Gaussian process modulated Cox processes under linear inequality constraints

April 16 - 18, 2019 AISTATS, Naha, Okinawa, Japan

Authors: Andrés Felipe Lopez Lopera (he was an intern at PROWLER.io when he did this work), ST John, and Nicolas Durrande

Abstract: Gaussian process (GP) modulated Cox processes are widely used to model point patterns in a great variety of applications. Existing approaches require a mapping (link function) between the real valued GP and the (positive) intensity function. This commonly yields solutions that do not have a closed form or that are restricted to specific covariance functions. We introduce a novel finite approximation of GP-modulated Cox processes where positiveness conditions can be imposed directly on the GP, with no restrictions on the covariance function. Furthermore, our approach can ensure other types of inequality constraints (e.g. monotonicity, convexity), resulting in more versatile models that can be used for other classes of point processes (e.g. renewal processes). We demonstrate on both synthetic and real-world data that our framework accurately infers the intensity functions. Where monotonocity is a feature of the process, our ability to include this in the inference improves results.

Probabilistic Modelling

Gaussian Processes

Cox Processes

Point Processes

Renewal Processes

Dimension Reduction

Truncated Gaussian Distribution

Banded Matrix Operators for Gauss-Markov Models in the Automatic Differentiation Era

April 16 - 18, 2019 AISTATS, Naha, Okinawa, Japan

Authors: Nicolas Durrande, Vincent Adam, Lucas Bordeaux, Stefanos Eleftheriadis and James Hensman

Abstract: Banded matrices can be used to express several models including linear state-space models, some Gaussian processes, and Gauss-Markov random fields. Whilst software libraries such as TensorFlow, PyTorch and Stan are changing the face of machine learning and statistics, banded matrices have been underutilized; the banded representation of models can avoid the inefficient use of loops in high level programming languages and allows easy construction of more complex models from constituent parts (e.g. additive or deep GPs). In this work we revisit the banded representation of several models and examine which banded matrix operations are required to implement inference using variational inference or gradient-based sampling. We collect the necessary operators and derive their reverse-mode derivatives, which can all be executed in linear time.

Probabilistic Modelling

Banded Matrices

Markov Structures

TensorFlow

Gaussian Processes

Variational Inference

Neural network ensembles and variational inference revisited

December 2nd 2018, AABI, Montreal, Canada

Authors: Marcin Tomczak (PROWLER.io), Siddharth Swaroop (University of Cambridge), Richard Turner (University of Cambridge)

Abstract: Ensembling methods and variational inference provide two orthogonal methods for obtain-ing reliable predictive uncertainty estimates for neural networks. In this work we compareand combine these approaches finding that: i) variational inference outperforms ensem-bles of neural networks, and ii) ensembled versions of variational inference bring furtherimprovements. The first finding appears at odds with previous work (Lakshminarayananet al., 2017), but we show that the previous results were due to an ambiguous experimentalprotocol in which the model and inference method were simultaneously changed.

Reinforcement Learning

Ensembles

Variational Inference

Approximate Inference

Uncertainty Estimation

Monte-Carlo Tree Search for Constrained POMDPs

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Jongmin Lee (KAIST), Geon-hyeong Kim (KAIST), Pascal Poupart (University of Waterloo), and Kee-Eung Kim (KAIST & PROWLER.io)

Abstract: Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

Reinforcement Learning

NeurIPS

Monte-Carlo Tree Search

Constrained Optimisation

A Bayesian Approach to Generative Adversarial Imitation Learning

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Wonseok Jeon (KAIST), Seokin Seo (KAIST), and Kee-Eung Kim (KAIST & PROWLER.io)

Abstract: Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks. This paradigm is based on reducing the imitation learning problem to the density matching problem, where the agent iteratively refines the policy to match the empirical state-action visitation frequency of the expert demonstration. Although this approach has shown to robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation. To address this issue, we first propose a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural networks. Then, we show that we can significantly enhance the sample efficiency of GAIL leveraging the predictive density of the cost, on an extensive set of imitation learning tasks with high-dimensional states and actions.

Reinforcement Learning

NeurIPS

Adversarial Training

Imitation Learning

Generative Model

Infinite Horizon Gaussian Processes

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Arno Solin (Aalto Yliopisto), Richard Turner (University of Cambridge) and James Hensman (PROWLER.io)

Abstract: Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets. State space modelling (Kalman filtering) enables these non-parametric models to be deployed on long datasets by reducing the complexity to linear in the number of data points. The complexity is still cubic in the state dimension m. In certain special cases (Gaussian likelihood, regular spacing) the GP posterior will reach a steady posterior state when the data is very long. We leverage this and formulate an inference scheme for GPs with general likelihoods, where inference is based on single-sweep EP (assumed density filtering). The infinite-horizon model tackles the cubic cost in the state dimensionality and reduces the cost in the state dimension m to O(m2) per data point. The model is extended to online-learning of hyperparameters. We show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100 Hz Infinite Horizon Gaussian Processes.

Probabilistic Modelling

NeurIPS

Gaussian Processes

Gauss-Markov Models

Orthogonally Decoupled Variational Gaussian Processes

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Hugh Salimbeni (PROWLER.io and Imperial College London), Ching-An Cheng (Georgia Institute of Technology), Byron Boots (Georgia Institute of Technology), Marc Deisenroth (PROWLER.io and Imperial College London)

Abstract: Gaussian processes provide a powerful non-parametric framework for reasoning over functions. Despite appealing theories, its superlinear computational and memory complexities have presented a long-standing challenge. The state-of-the-art methods of sparse variational inference trade modeling accuracy with complexity. However, their complexities still scale superlinearly in the number of basis functions, so they can learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed to remove the unnecessary coupling between the complexities of modeling the mean and the covariance functions. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.

Probabilistic Modelling

NeurIPS

Gaussian Processes

Bandit Learning in Concave N-Person Games

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Mario Bravo (Universidad de Santiago de Chile), Panayotis Mertikopoulos (Univ. Grenoble Alpes), David Leslie (PROWLER.io)

Abstract: This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents’ most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players’ behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Multi-agent Systems

NeurIPS

Game Theory

Online Learning

Learning Invariances using the Marginal Likelihood

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Mark van der Wilk (PROWLER.io), Matthias Bauer (University of Cambridge and the Max Planck institute in Tübingen), ST John (PROWLER.io), James Hensman (PROWLER.io)

Abstract: Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations on the input that are known to be irrelevant (e.g. translation). Commonly, this is done through data augmentation, where the training set is enlarged by applying hand-crafted transformations to the inputs. We argue that invariances should instead be incorporated in the model structure, and learned using the marginal likelihood, which correctly rewards the reduced complexity of invariant models. We demonstrate this for Gaussian process models, due to the ease with which their marginal likelihood can be estimated. Our main contribution is a variational inference scheme for Gaussian processes containing invariances described by a sampling procedure. We learn the sampling procedure by back-propagating through it to maximise the marginal likelihood.

Probabilistic Modelling

NeurIPS

Gaussian Processes

Gaussian Process Conditional Density Estimation

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Vincent Dutordoir (PROWLER.io), Hugh Salimbeni (PROWLER.io and Imperial College London), Marc Deisenroth (PROWLER.io and Imperial College London), James Hensman (PROWLER.io)

Abstract: Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference. Our approach can be used to model densities even in sparse data regions, and allows for sharing learned structure between conditions. We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images.

Probabilistic Modelling

NeurIPS

Gaussian Processes

Distributed Multitask Reinforcement Learning with Quadratic Convergence

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Rasul Tutunov (PROWLER.io), Dongho Kim (PROWLER.io), Haitham Bou-Ammar (PROWLER.io)

Abstract: Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large. The main reason behind this drawback is the reliance on centeralised solutions. Recent methods exploited the connection between MTRL and general consensus to propose scalable solutions. These methods, however, suffer from two drawbacks. First, they rely on predefined objectives, and, second, exhibit linear convergence guarantees. In this paper, we improve over state-of-the-art by deriving multitask reinforcement learning from a variational inference perspective. We then propose a novel distributed solver for MTRL with quadratic convergence guarantees.

Reinforcement Learning

NeurIPS

Multitask Learning

Distributed Optimisation

Data Efficiency

Scalability

Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials

Presented 20 February 2018, University of Manchester

Authors: S.T. John (PROWLER.io), Gábor Csányi (University of Cambridge)

The Journal of Physical Chemistry (citation: J. Phys. Chem. B 121, 48, 10934-10949)

Abstract: We introduce a computational framework that is able to describe general many-body coarse-grained (CG) interactions of molecules and use it to model the free energy surface of molecular liquids as a cluster expansion in terms of monomer, dimer, and trimer terms. The contributions to the free energy due to these terms are inferred from all-atom molecular dynamics (MD) data using Gaussian Approximation Potentials, a type of machine-learning model that employs Gaussian process regression. The resulting CG model is much more accurate than those possible using pair potentials. Though slower than the latter, our model can still be faster than all-atom simulations for solvent-free CG models commonly used in biomolecular simulations.

Gaussian Processes

High-Dimensional Representation

Probabilistic Modelling

DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding

Published October 2017

The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3190-3199

Authors: Dieu Linh Tran, Robert Walecki, Ognjen (Oggi) Rudovic, Stefanos Eleftheriadis, Bjorn Schuller, Maja Pantic;

Abstract

Human face exhibits an inherent hierarchy in its representations (i.e., holistic facial expressions can be encoded via a set of facial action units (AUs) and their intensity). Variational (deep) auto-encoders (VAE) have shown great results in unsupervised extraction of hierarchical latent representations from large amounts of image data, while being robust to noise and other undesired artifacts. Potentially, this makes VAEs a suitable approach for learning facial features for AU intensity estimation. Yet, most existing VAE-based methods apply classifiers learned separately from the encoded features. By contrast, the non-parametric (probabilistic) approaches, such as Gaussian Processes (GPs), typically outperform their parametric counterparts, but cannot deal easily with large amounts of data. To this end, we propose a novel VAE semi-parametric modeling framework, named DeepCoder, which combines the modeling power of parametric (convolutional) and non-parametric (ordinal GPs) VAEs, for joint learning of (1) latent representations at multiple levels in a task hierarchy, and (2) classification of multiple ordinal outputs. We show on benchmark datasets for AU intensity estimation that the proposed DeepCoder outperforms the state-of-the-art approaches, and related VAEs and deep learning models.

Deep Learning

Gaussian Processes

Probabilistic Modelling

Gaussian Process Domain Experts for Modeling of Facial Affect

Published 28 June 2017

IEEE Transactions on Image Processing

Authors: Stefanos Eleftheriadis (PROWLER.io), Ognjen Rudovic (MIT Media Lab), Marc Peter Deisenroth, (PROWLER.io, Imperial College London), Maja Pantic (Imperial College London).

Abstract: Most of existing models for facial behavior analysis rely on generic classifiers, which fail to generalize well to previously unseen data. This is because of inherent differences in source (training) and target (test) data, mainly caused by variation in subjects' facial morphology, camera views, and so on. All of these account for different contexts in which target and source data are recorded, and thus, may adversely affect the performance of the models learned solely from source data. In this paper, we exploit the notion of domain adaptation and propose a data efficient approach to adapt already learned classifiers to new unseen contexts. Specifically, we build upon the probabilistic framework of Gaussian processes (GPs), and introduce domain-specific GP experts (e.g., for each subject). The model adaptation is facilitated in a probabilistic fashion, by conditioning the target expert on the predictions from multiple source experts. We further exploit the predictive variance of each expert to define an optimal weighting during inference. We evaluate the proposed model on three publicly available data sets for multi-class (MultiPIE) and multi-label (DISFA, FERA2015) facial expression analysis by performing adaptation of two contextual factors: “where” (view) and “who” (subject). In our experiments, the proposed approach consistently outperforms: 1) both source and target classifiers, while using a small number of target examples during the adaptation and 2) related state-of-the-art approaches for supervised domain adaptation.

Data Efficiency

Domain Adaptation

Gaussian Processes

Probabilistic Modelling

Distributed Lifelong Reinforcement Learning with Sub-Linear Regret

17-19 December 2018, Miami Beach

IEEE Conference on Decision and Control (CDC), Miami Beach, Dec. 17-19, 2018 (To Appear)

Authors: Julia El-Zini, Rasul Tutunov (PROWLER.io), Haitham Bou-Ammar (PROWLER.io), and Ali Jadbabaie

Abstract: In this paper, we propose a distributed second- order method for lifelong reinforcement learning (LRL). Upon observing a new task, our algorithm scales state-of-the-art LRL by approximating the Newton direction up-to-any arbitrary precision ε > 0, while guaranteeing accurate solutions. We analyze the theoretical properties of this new method and derive, for the first time to the best of our knowledge, sublinear regret under this setting

Distributed Optimisation

Learning to Learn

Lifelong Learning

Online Learning

Reinforcement Learning

Scalable Lifelong Reinforcement Learning

Published December 2017

Journal of Pattern Recognition, Volume, 72, Dec. 2017 

Authors: Yusen Zhan, Haitham Bou-Ammar (PROWLER.io), and Matthew E. Taylor 

Abstract: Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a linear convergence rate compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.

Data Efficiency

Distributed Optimisation

Learning to Learn

Lifelong Learning

Multitask Learning

Online Learning

Reinforcement Learning

Scalability

Correctness-by-Learning of Infinite-State Component-Based Systems

10-13 October 2017, Braga, Portugal

International Conference on Formal Aspects of Component Software, Braga, Portugal, 10-13 Oct. 2017 

Authors: Haitham Bou-Ammar (PROWLER.io), Mohamad Jaber, and Mohammad Nassar 

Abstract: We introduce a novel framework for runtime enforcement of safe executions in component-based systems with multi-party interactions modelled using BIP. Our technique frames runtime enforcement as a sequential decision-making problem and presents two alternatives for learning optimal strategies that ensure fairness between correct traces. We target both finite and infinite state-spaces. In the finite case, we guarantee that the system avoids bad-states by casting the learning process as a one of determining a fixed point solution that converges to the optimal strategy. Though successful, this technique fails to generalize to the infinite case due to the need for building a dictionary, which quantifies the performance of each state-interaction pair. As such, we further contribute by generalizing our framework to support the infinite setting. Here, we adapt ideas from function approximators and machine learning to encode each state-interaction pairs’ performance. In essence, we autonomously learn to abstract similar performing states in a relevant continuous space through the usage of deep learning. We assess our method empirically by presenting a fully implemented tool, so-called RERL. Particularly, we use RERL to: 1) enforce deadlock freedom on a dining philosophers benchmark, and 2) allow for pair-wise synchronized robots to autonomously achieve consensus within a cooperative multi-agent setting.

Autonomous Formal Verification Methods

Reinforcement Learning

Software Engineering

Non-convex Policy Search Using Variational Inequalities

Published October 2017

Journal of Neural Computation, Volume 29, Issue 10, MIT Press, Oct. 2017

Authors: Yusen Zhan, Haitham Bou-Ammar (PROWLER.io), and Matthew E. Taylor 

Abstract: Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies.

These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

Non-Convexity

Optimisation

Reinforcement Learning

An Information-Theoretic On-Line Update Principle for Perception-Action Coupling

24-28 September 2017, Vancouver, Canada

IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, Sep 24-28, 2017

Authors: Zhen Peng, Tim Genewein, Felix Leibfried (PROWLER.io), and Daniel Braun

Abstract: Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow and formulate a constrained optimization problem that maximizes utility under limited information-processing capacity in the two channels. As a solution, we obtain an optimal perceptual channel and an optimal action channel that are coupled such that perceptual information is optimized with respect to downstream processing in the action module. The main novelty of this study is that we propose an online optimization procedure to find bounded-optimal perception and action channels in parameterized serial perception-action systems. In particular, we implement the perceptual channel as a multi-layer neural network and the action channel as a multinomial distribution. We illustrate our method in a NAO robot simulator with a simplified cup lifting task.

Bounded-Rationality

Information Theory

Reinforcement Learning

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

6-11 August, Sydney, Australia

Workshop on Principled Approaches to Deep Learning at the International Conference on Machine Learning, Sydney, Australia, Aug 6-11, 2017

Authors: Felix Leibfried (PROWLER.io), Nate Kushman, and Katja Hofmann

Abstract: Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient but need to acquire explicit knowledge about the environment. In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.

ATARI Domain

Deep Learning

Reinforcement Learning

Efficiently detecting switches against non-stationary opponents

Published July 2017

Journal of Autonomous Agent Multi-Agent Systems (July 2017) 31: 767.

Authors: Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, L. Enrique Sucar · Enrique Munoz de Cote (PROWLER.io)

Abstract: Interactions in multiagent systems are generally more complicated than single-agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real-world scenarios where agents have limited capacities and may deviate from a perfectly rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

Game Theory

Learning

Multi-agent Systems

Non-stationary Environments

Repeated Games

Switching Strategies

Differential evolution strategies for large-scale energy resource management in smart grids

15-19 July 2017

Proceedings of the Genetic and Evolutionary Computation Conference, July 15-19 2017

Authors: Fernando Lezama, Enrique Sucar, Joao Soares, Zita Vale, Enrique Munoz de Cote (PROWLER.io)

Abstract: Smart Grid (SG) technologies are leading the modifications of power grids worldwide. The Energy Resource Management (ERM) in SGs is a highly complex problem that needs to be efficiently addressed to maximize incomes while minimizing operational costs. Due to the nature of the problem, which includes mixed-integer variables and non-linear constraints, Evolutionary Algorithms (EA) are considered a good tool to find optimal and near-optimal solutions to large-scale problems. In this paper, we analyze the application of Differential Evolution (DE) to solve the large-scale ERM problem in SGs through extensive experimentation on a case study using a 33-Bus power network with high penetration of Distributed Energy Resources (DER) and Electric Vehicles (EVs), as well as advanced features such as energy stock exchanges and Demand Response (DR) programs. We analyze the impact of DE parameter setting on four state-of-the-art DE strategies. Moreover, DE strategies are compared with other well-known EAs and a deterministic approach based on MINLP. Results suggest that, even when DE strategies are very sensitive to the setting of their parameters, they can find better solutions than other EAs, and near-optimal solutions in acceptable times compared with an MINLP approach.

Differential Evolution

Evolutionary Algorithms

Multi-agent Systems

Distributed Newton for Network Flow Optimisation

(No date yet – To appear)

(not yet available online)

SIAM Journal on Optimisation

Authors: Rasul Tutunov, Haitham Bou-Ammar (PROWLER.io), and Ali Jadbabaie 

Abstract: In this paper, we propose a new distributed second-order method for network flow optimization. Our algorithm exploits the symmetry and diagonal dominance property of the dual Hessian to determine the Newton direction in a distributed fashion up-to-any precision ε > 0. This is achieved by first introducing a novel distributed solver for systems of equations described by symmetric diagonally dominant matrices based on Chebyshev polynomials. Our solver is then used to compute the Newton direction leading to a novel algorithm exhibiting similar phases of convergence to the exact (i.e., centralized) Newton method. We rigorously analyze the theoretical properties of both the solver and the distributed Newton method. Finally, we provide empirical validation for the proposed technique on a variety of network topologies.

Data Efficiency

Graph Theory

Network Flow

Optimisation

An Exploration Strategy Facing Non-Stationary Agents (JAAMAS Extended Abstract)

May 8 - 12, 2017 AAMAS, Sao Paulo, Brazil

The 16th International Conference on Autonomous Agents and Multiagent Systems, 2017

Authors: P. Hernandez-Leal, Y. Zhan, M. E. Taylor, L. E. Sucar, E. Munoz de Cote

Abstract: Multi-agent systems where the agents interact among themselves and with a stochastic environment can be formalized as stochastic games. We study a subclass of these games, named Markov potential games (MPGs), that appear often in economic and engineering applications when the agents share some common resource. We consider MPGs with continuous state-action variables, coupled constraints and nonconvex rewards. Previous analysis followed a variational approach that is only valid for very simple cases (convex rewards, invertible dynamics, and no coupled constraints); or considered deterministic dynamics and provided open-loop (OL) analysis, studying strategies that consist in predefined action sequences, which are not optimal for stochastic environments. We present a closed-loop (CL) analysis for MPGs and consider parametric policies that depend on the current state and where agents adapt to stochastic transitions. We provide easily verifiable, sufficient and necessary conditions for a stochastic game to be an MPG, even for complex parametric functions (e.g., deep neural networks); and show that a closed-loop Nash equilibrium (NE) can be found (or at least approximated) by solving a related optimal control problem (OCP). This is useful since solving an OCP---which is a single-objective problem---is usually much simpler than solving the original set of coupled OCPs that form the game---which is a multiobjective control problem. This is a considerable improvement over the previously standard approach for the CL analysis of MPGs, which gives no approximate solution if no NE belongs to the chosen parametric family, and which is practical only for simple parametric forms. We illustrate the theoretical contributions with an example by applying our approach to a noncooperative communications engineering game. We then solve the game with a deep reinforcement learning algorithm that learns policies that closely approximates an exact variational NE of the game.

Multi-agent Systems

Reinforcement Learning

Opponent Modelling

Game Theory

Book title: Applications for Future Internet (International Summit, AFI 2016)

May 25 - 28, 2016 - International Summit, AFI, Puebla, Mexico

Authors: Enrique Sucar (INAOE, Mexico), Oscar Mayora (CREATENET, Italy), Enrique Munoz de Cote (PROWLER.io, UK)

Abstract: This book constitutes the refereed proceedings of the International Summit on Applications for Future Internet, AFI 2016, held in Puebla, Mexico, in May 2016.

The 21 papers presented were carefully selected from 29 submissions and focus on the usage of Future Internet in the biological and health sciences as well as the increased application of IoT devices in fields like smart cities, health and agriculture.

Multi-agent Systems

IoT

Sensor Networks

Join us to make AI that will change the world

join our team