Per-Decision Option Discounting

back to our research

Per-Decision Option Discounting

June 9 - 15 2019 ICML, Long Beach, USA

Authors: Anna Harutyunyan (DeepMind), Peter Vrancx, Philippe Hamel (DeepMind), Ann Nowe (Vrije Universiteit Brussel) and Doina Precup (DeepMind)

Abstract: In order to solve complex problems, an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modelled through options, offers the ability to reason at many time scales, but the horizon length is still determined by the single discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

Reinforcement Learning

Planning Horizon

Learning


See paper

Join us to make AI that will change the world

join our team