has three research Papers accepted to AAAI!

back to our news has three research Papers accepted to AAAI! is delighted to announce that the following three papers have been accepted by

The 32nd AAAI Conference on Artificial Intelligence (AAAI-18).

Since 1979, the Association for the Advancement of Artificial Intelligence has been one of the top promoters of research in artificial intelligence and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines. It had a record number of submissions this year – over 3800 – and only accepted 933 papers.

Decentralised Learning in Systems with Many, Many Strategic Agents

(link not yet available)

Authors: by David Mguni, Joel Jennings, Enrique Munoz de Cote (all

Abstract: Although multi-agent reinforcement learning can tackle systems of strategically interacting entities, it currently fails in scalability and lacks rigorous convergence guarantees. Crucially, learning in multi-agent systems can become intractable due to the explosion in the size of the state-action space as the number of agents increases. In this paper, we propose a method for computing closed-loop optimal policies in multi-agent systems that scales independently of the number of agents. This allows us to show, for the first time, successful convergence to optimal behaviour in systems with an unbounded number of interacting adaptive learners. Studying the asymptotic regime of N−player stochastic games, we devise a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large. Our method is model-free and completely decentralised so that each agent need only observe its local state information and its realised rewards. We validate these theoretical results by showing convergence to closed-loop Nash- equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.

Tags: AAAI, Game Theory, Multiagent Systems, Reinforcement Learning

Learning with options that terminate off-policy

(link not yet available)

Authors: Anna Harutyunyan (Vrije Universiteit Brussel), Peter Vrancx (, Pierre-Luc Bacon (McGill University), Doina Precup (McGill University), Ann Nowe (Vrije Universiteit Brussel),

Abstract: A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(?), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(?) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.

Tags: AAAI, Reinforcement Learning, Representation Learning

Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

Authors: Denis Steckelmacher (Vrije Universiteit Brussel), Diederik M. Roijers (Vrije Universiteit Brussel), Anna Harutyunyan (Vrije Universiteit Brussel), Peter Vrancx (, Hélène Plisnier (Vrije Universiteit Brussel), Ann Nowe (Vrije Universiteit Brussel),

Abstract: Many real-world reinforcement learning problems have both a hierarchical nature, and exhibit a degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options),  this paper proposes an integrated approach to deal with both issues at the same time.  To achieve this, we extend the options framework for hierarchical learning to make the option initiation sets conditional on the previously-executed option. We show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers. We also empirically demonstrate that OOIs are much more sample-efficient than using a recurrent neural network over options, and illustrate the flexibility of OOIs regarding the amount of domain knowledge available at design time.

Tags: AAAI, Reinforcement Learning, Representation Learning