The secret to scalable reinforcement learning

Our NeurIPS 2018 paper solves multiple RL problems, simultaneously.

back to our blogs

The secret to scalable reinforcement learning

Our NeurIPS 2018 paper solves multiple RL problems, simultaneously.

By

Reinforcement learning is a core machine learning technique where the algorithm, effectively, learns from trial and error. If reinforcement learning is ever to resemble humans, knowledge transfer and efficiency is crucial.

Our Distributed Multitask Reinforcement Learning with Quadratic Convergence NeurIPS 2018 paper proposes a highly scalable and efficient solution to solve multiple reinforcement learning problems simultaneously. This work could be applied to use cases where the scalability of the core technology itself is a key impediment. 

Not only does our method allow for scalability in terms of the number of tasks, but it also achieves sample efficiency by enabling the transfer of knowledge between problems. As a result, memory bottlenecks are resolved and fast processing speeds are achieved.

How does it work?

The work entailed a novel combination of different mathematical fields. The first was reinforcement learning, the second was multitask learning, while the third was distributed optimisation.

We produced both empirical and theoretical results. On the theory side, we showed quadratic convergence to an optimum (one of the fastest known convergences in the literature). On the empirical side, we demonstrated how we can control hundreds of dynamic systems while boosting initial performance.

Jump Start Improvement

These results show that our SDD-Newton method can outperform other Multi-Task Reinforcement Learning methods in terms of jump-start performance while requiring fewer iterations.

What’s the impact of this work?

Reinforcement learning is an important field in machine learning that allows learning agents to solve sequential decision-making tasks. A key issue in reinforcement learning is to design algorithms that are both scalable and data efficient. We provide a technique that will help other researchers scale their solutions to hundreds of problems by allowing knowledge reuse across tasks.

This paper brings fundamental performance improvements to reinforcement learning algorithms. These, in turn, could improve applications across multiple sequential decision-making domains such as robotics, autonomous vehicles, stock trading, supply chain and logistics.

Find out more

The NeurIPS 2018 machine learning research conference accepted eight research papers from PROWLER.io and the world-leading institutions and researchers we regularly collaborate with.

You can read the abstract here and access the full paper Distributed Multitask Reinforcement Learning with Quadratic Convergence here.

Join us to make AI that will change the world

join our team