Large-scale Distributed Optimisation for Machine Learning and Beyond

back to our blogs

Large-scale Distributed Optimisation for Machine Learning and Beyond

Rasul and Haitham

By Rasul Tutunov (Senior Machine Learning Scientist) and Haitham Bou Ammar (Reinforcement Learning Team Lead)

In order to fulfil its mission of solving A.I. decision-making in complex systems, is tackling head on one of the most widespread problems in computing: the need for techniques that quickly find optimal solutions using limited resources. One approach inspired by human intelligence is the idea that complex thinking and decision-making are often distributed.

Since humans have direct access only to their own thinking processes, we naturally tend to think of intelligence as something that occurs inside single brains or minds. But the past few decades of cognitive science research have seen the emergence of other paradigms of thought. We’ve learned that our brains actually process perceptions, information, decisions and actions in a multitude of subsystems simultaneously. Some of our systems provide quick intuitive answers, others allow for slow rational deliberation. Much of our thinking seems to be embodied in other parts of our nervous systems or even extended outwards to the tools we use, the ideas others share with us and the culture we learn from and contribute to. In short, especially in complex systems and societies, thinking never happens in isolation.

Yet current Machine Learning techniques remain largely isolated and centralised. Machine Learning (ML) is a sub-field of computer science dedicated to deriving complex predictive models that can discover useful information, suggest conclusions and support decision-making in a variety of fields including finance, logistics, and robotics. Unlike standard hard-coded programming, ML uses a general architecture of computer code that can accept inputs, derive outputs, and optimize parameters through a self-tuning process. Tuning free parameters, however, is an increasingly tough challenge due to the proliferation of available data. Memory and computational requirements are growing much faster than processing speeds, making traditional solutions inefficient. State of the art deep neural networks, for example, operate with millions of parameters and can be very slow to train.

But what if we apply a “divide and conquer” paradigm to this hard computational problem? In our paper, using a method called distributed optimisation, we split data across multiple processing units and target a global solution by collaboratively solving more tractable local sub-problems. This approach has multiple benefits: it's scalable, cost-efficient and robust.

Optimisation Body

A depiction of our distributed framework: as all the agents search for an optimal solution, each agent shares its estimates with its immediate neighbours.

Distributed optimization already provides a rich set of algorithms that rely on communication between processors to determine free parameters in a decentralized way. Among the many possible approaches (e.g., distributed averaging, coordinate descent, incremental methods etc.), the fastest distributed optimization techniques are currently based on the application of second-order (Newton) methods. Though appealing, computing the Newton direction in a distributed fashion is challenging because one needs to invert the Hessian, which still requires global information.

In our work, we derive a novel connection between the Hessian of a distributed optimization problem and Symmetric Diagonally Dominant (SDD) matrices. are the first to suggest a fully distributed algorithm for solving symmetric diagonally dominant systems. By properly exploiting the curvature information stored in the Hessian, we converge to a consensus between local subproblems at a quadratic rate — the fastest speed shown in the literature so far.

Optimisation Animation Med

A comparison to two state of the art algorithms: our Distributed Newton Method (DNM) finds the optimal solution by locating and sharing the direction of descent faster and more efficiently than both Alternating Direction Method of Multipliers (ADMM) and Network Newton (NN).

To validate our theoretical achievement of these convergence speeds, we evaluated our method against five state-of-the-art approaches on a variety of machine learning problems. We chose these applications because they are inspired by real-world problems and show that our approach could be used for multitask decision-making in a wide range of systems, including autonomous helicopters and humanoid robots.

For the first time, we were able to successfully show transfer between thousands of reinforcement learning control problems:

Large Scale Distributed Optimisation Graph 1

When we compare both times and iterations to convergence for 1000 Helicopters (HC) and 500 Humanoid Robots (HR), our approach outperforms all competing solutions (shorter is better).

Large Scale Distributed Optimisation Graph 2

Our method (DNM) achieves far greater accuracy at a much lower communication cost.

Large Scale Distributed Optimisation Graph 3

Our approach shows a significant percentage improvement over state-of-the-art techniques in a range of applications inspired by real world problems including Helicopter (HC), Simple Mass system (SM), Cart Pole (CP), Double Cart Pole (DCP) and Double Mass System (DM).

This solution to one of the most widespread issues in Machine Learning promises to ensure that artificial minds don't miss out on a tool that human financial analysts, system designers or robotics engineers take for granted: distributed thinking.