AAAI Symposium

back to our events

AAAI Symposium @ Stanford, March 26th-28th, 2018

Data-Efficient Reinforcement Learning


Sequential decision making (SDM) is an essential component for autonomous systems. Although significant progress has been made towards developing algorithms for solving isolated SDM tasks, these algorithms often require large amounts of experience before achieving acceptable performance. This is particularly true for high dimensional tasks, such as robotics control or general game playing environments.

Multiple methods have been proposed for efficient reinforcement learning algorithms that can generalize well to unobserved environments or situations. These include imitation learning, lifelong learning, multi-task learning, transfer learning, and model-based reinforcement learning.

This symposium will give reinforcement learning researchers an opportunity to present their work and to discuss potential solutions to these challenging problems. We solicit papers that present novel results and algorithms, with an emphasis on approaches that are theoretically grounded. We are particularly interested in RL benchmarking and applications in various fields such as robotics, medicine and game playing, as well as real-world, large-scale applications. Submissions from inverse reinforcement learning are also welcomed. Topics of interest include, but are not limited to:

  1. Reinforcement Learning
  2. Model-Based Reinforcement Learning
  3. Inverse Reinforcement Learning
  4. Deep Reinforcement Learning
  5. Transfer Learning
  6. Multi-Task Reinforcement Learning
  7. Lifelong Reinforcement Learning
  8. Large-Scale Applications of Reinforcement Learning
  9. Novel Applications of Reinforcement Learning
  10. Benchmarking Reinforcement Learning Algorithms
  11. Learning from Demonstration
  12. Imitation Learning

Symposium Schedule

The symposium will include invited talks, presentations of accepted papers and discussions over three days.

Monday, March 26
  • 10:00 am – 10:30 pm: Opening and welcome to Data Efficient Reinforcement Learning by Haitham Bou-Ammar
  • 10:30 am – 12:15 pm: Invited talk by Warren Powell
  • 12:30 pm – 2:00 pm: Lunch break
  • 2:15 pm – 2:45 pm: Inverse Reinforcement Learning via Nonparametric Subgoal Modelling
  • 2:50 pm – 3:20 pm: State Abstraction Synthesis for Discrete Models of Continuous Domains
  • 3:20 pm – 4:00 pm: Bayesian Q-Learning with Assumed Density Filtering
  • 4:05 pm – 5:00 pm: Invited talk by Han Liu

Tuesday, March 27
  • 10:00 am – 10:25 am: Hierarchical Approaches for Reinforcement Learning in Parameterised Action Spaces
  • 10:30 am – 11:00 am: Multi-agent Soft Q-Learning
  • 11:15 am – 12:15 pm: Invited talk by Peter Stone
  • 12:30 pm – 2:00 pm: Lunch break
  • 2:15 pm – 2:45 pm: Efficient Exploration for Constrained MDPs
  • 3:00 pm – 4:00 pm: Invited talk by Jan Peters (to be confirmed)

Wednesday, March 28
  • 10:00 am – 10:30 am: Towards a Data Efficient Off-Policy Gradients
  • 10:35 am – 11:05 am: Run, Skeleton, Run: Skeletal Model in Physics-Based Simulation
  • 11:10 am – 12:15pm: Talk by Haitham Bou-Ammar

Invited Speakers

Prof. Warren B. Powell is a professor in the Department of Operations Research and Financial Engineering at Princeton University, where he has taught since 1981 after receiving his BSE from Princeton University and Ph.D. from MIT. He is the founder and director of the laboratory for Computational Stochastic Optimization and Learning (CASTLE Labs), which spans contributions to models and algorithms in stochastic optimization, with applications to energy systems, transportation, health and medical research, business analytics and the laboratory sciences (see He has pioneered the use of approximate dynamic programming for high-dimensional applications in freight transportation, where his projects have twice been recognized as Edelman finalists, and one won the Daniel Wagner prize. This research led him to the field of optimal learning for optimizing expensive functions using the knowledge gradient. A Fellow of Informs, he has served in a range of service positions spanning the Society for Transportation and Logistics, the Informs Computing Society, and the Informs Optimization Society. He has published two books and over 200 papers, and is currently working on a new book “Optimization under Uncertainty: A Unified Framework.” He has supervised 50 graduate students and post-docs, and almost 200 undergraduates.

Abstract: A Unified View on Stochastic Optimisation

Stochastic optimization is a fragmented field comprised of multiple communities from within operations research (stochastic programming, Markov decision processes, simulation optimization, decision analysis, bandit problems); computer science (reinforcement learning, bandit problems); optimal control (stochastic control, model predictive control, online computation); and applied mathematics (stochastic search). In this talk, I will identify the major dimensions of this rich class of problems, spanning static to fully sequential problems, offline and online learning (including so-called “bandit” problems), derivative-free and derivative-based algorithms, with attention given to problems with expensive function evaluations. In this tutorial, I will give a common mathematical framework for modelling all of these problems using a single formulation. This framework consists of five fundamental elements (states, decisions/actions/ controls, exogenous information, transition function and objective function), and requires optimizing over policies, which is the major point of departure. We divide solution strategies for sequential problems (“dynamic programs”) between stochastic search (“policy search”) and policies based on lookahead approximations (which include both stochastic programming as well as value functions based on Bellman’s equations). We further divide each of these two fundamental solution approaches into two subclasses, producing four (meta)classes of policies for approaching sequential stochastic optimization problems. We use a simple energy storage example to demonstrate that each of these four classes may work best, as well as opening the door to a range of hybrid policies. The ultimate goal of the tutorial will be to put all of these problems into a single, elegant framework that makes it possible to draw on the entire spectrum of tools that have been developed in different settings. Every problem class, as well as the solution strategies, will be illustrated using actual applications.

Prof. Peter Stone is the David Bruton, Jr. Centennial Professor and Associate Chair of Computer Science, as well as Chair of the Robotics Portfolio Program, at the University of Texas at Austin. In 2013 he was awarded the University of Texas System Regents' Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers, earning him the title of University Distinguished Teaching Professor. Professor Stone's research interests in Artificial Intelligence include machine learning (especially reinforcement learning), multiagent systems, robotics, and e-commerce. Professor Stone received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs - Research. He is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, Fulbright Scholar, and 2004 ONR Young Investigator. In 2003, he won an NSF CAREER award for his proposed long-term research on learning agents in dynamic, collaborative, and adversarial multiagent environments, in 2007 he received the prestigious IJCAI Computers and Thought Award, given biannually to the top AI researcher under the age of 35, and in 2016 he was awarded the ACM/SIGAI Autonomous Agents Research Award. Professor Stone co-founded Cogitai, Inc., a startup company focussed on continual learning, in 2015, and currently, serves as President and COO.

Abstract: Curriculum Design for Reinforcement Learning (Tentative - to be announced)

Prof. Jan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt and at the same time a senior research scientist and group leader at the Max-Planck Institute for Intelligent Systems, where he heads the interdepartmental Robot Learning Group.

Jan Peters has received the Dick Volz Best 2007 US PhD Thesis Runner-Up Award, the Robotics: Science & Systems - Early Career Spotlight, the INNS Young Investigator Award, and the IEEE Robotics & Automation Society's Early Career Award. Recently, he received an ERC Starting Grant.Jan Peters has studied Computer Science, Electrical, Mechanical and Control Engineering at TU Munich and FernUni Hagen in Germany, at the National University of Singapore (NUS) and the University of Southern California (USC). He has received four Master's degrees in these disciplines as well as a Computer Science PhD from USC. Jan Peters has performed research in Germany at DLR, TU Munich and the Max Planck Institute for Biological Cybernetics (in addition to the institutions above), in Japan at the Advanced Telecommunication Research Institute (ATR), at USC and at both NUS and Siemens Advanced Engineering in Singapore.

(Abstract to be announced)

Prof. Han Liu received a Joint PhD in Machine Learning and Statistics in 2011 at Carnegie Mellon University. His thesis advisors were John Lafferty and Larry Wasserman. As a computer scientist and statistician, he exploits computation and data as a lens to explore science and machine intelligence. He examines this from the point of view provided by the twin windows of modern nonparametric methods and probabilistic graphical models.

The nonparametric method aims to infer from high dimensional data with weakest possible assumptions, while graphical models provide a unified framework which combines uncertainty (probability theory) and logical structures (graph theory) to model complex, real-world phenomena. Together they provide a powerful tool to handle challenging problems to shed insight on the nature of machine intelligence and if successful would lead to significant applications of the future. His specific research focuses on nonparametric structure learning and representation learning. Success in this research has the potential to revolutionize the foundation of the second generation of artificial intelligence (i.e., statistical machine learning) and push the frontier of the third generation of artificial intelligence (i.e., deep learning). He is serving as an Associate Editor of the Electronic Journal of Statistics and as area chairs for NIPS, AISTATS, and ICML. His theoretical research projects include: 1) nonparametric graphical models, 2) transelliptical modelling and robust inference, 3) nonconvex statistical optimisation, 4) post-regularisation inference, 5) high-dimensional nonparametric, and 6) fundamental limits of a computational model. His applied research interest is to develop a unified set of computational, statistical, and software tools to extract and interpret significant information from the data collected from a variety of scientific areas. Current projects include: 1) nonparametric graphical models for brain science and genomics, and 2) modern machine learning methods for computational finance.

Abstract: Nonparametric Representation Learning (to be announced)

Prof. James Hensman is a senior machine learning scientist at, where he currently leads the probabilistic modelling team. He is interested in statistical machine learning and its role in autonomous systems. Prior to joining, Prof. Hensman was a lecturer in the CHICAS research group at Lancaster University and a visiting researcher at Manchester University with Magnus Rattray. Before that, he worked at Sheffield University with Neil Lawrence, after completing a PhD with Keith Worden. With Alex Matthews, Prof. Hensman founded the open source Gaussian process library GPflow. Previously, he helped create GPy, another python package for Gaussian process models. Prof. Hensman has published over 65 papers in world-leading conferences and journals, including JMLR, TPAMI, NIPS and UAI.

Abstract: Model-Based Reinforcement Learning (to be announced)

Program Committee:

Jan Peters (Technical University of Darmstadt)

Alessandro Lazaric (Inria)

Marc Deisenroth (Imperial College London)

Josiah Hanna (The University of Texas at Austin)

Kee-Eung Kim (KAIST)

Dongho Kim (

James Hensman (

Enrique Munoz de Cote (

Peter Vrancx (

Haitham Bou Ammar (

Kee-Eung Kim (KAIST)

Organizing Committee:

Haitham Bou Ammar (

Dongho Kim (

Enrique Munoz de Cote (

James Hensman (

Matthew E. Taylor (Washington State University)

back to our events