Society of AIs

Flexible, adaptive and competitive systems

back to our blogs

Society of AIs

Flexible, adaptive and competitive systems

Enrique And Sofia

By Enrique Munoz de Cote Head of Multi-Agent Systems and Sofia Ceppi Senior Machine Learning Researcher

This will be the first of three blog posts that try to explain the sorts of problems faced by each of our research teams - Multi-Agent Systems (MAS), Reinforcement Learning (RL) and Probabilistic Modelling (PM) - and the kinds of solutions and strategies we use to solve them. 

Our MAS team is primarily made up of researchers from Game Theory (GT) and Machine Learning. The former is a mathematical framework that studies how rational agents make decisions. We generally start from deceptively simple questions that arise whenever large numbers of rational decision making agents come together: What can go wrong? What can go right? How can we get selfish agents to benefit the system as a whole? The theories and findings within this field have contributed to our understanding of microeconomic theory and have produced several Nobel laureates. The latter is a discipline coming from computer science and statistics and uses statistical techniques to give computers the ability to "learn" (i.e. improve in a given task as more data is fed in).

MAS = Game Theory + Machine Learning

From the point of view of an agent, the world can look something like this: a system where many decision-makers need to act both for themselves and together in a very complex environment.

Showing complexity within a city scene

If you’re behind the wheel in dense, complex traffic, you have to cope not just with the car and the road but with a crowd of other agents, other decision-makers. For you to move forward, you have to reason not only about your next action but about the sequences — and consequences— of actions that will take you to your destination. How will your actions impact other drivers? How will they react? Your decisions can come back to haunt you if other drivers don’t react as expected.

Traffic from a machines perspective

We’ve all been in this position, thrown in with a lot of other decision-makers, with different objectives, all acting in different ways, some behaving rationally, others less so, some even doing things that strike us as weird. We all live here. Soon a lot of AIs will live here too. It’s the sort of problem we deal with daily at And when you’re running large complex systems where many AIs live and learn on a shared environment, what you need to consider is the whole learning process (as opposed to just the final outcome, as game theory might suggest) and the impact it has on the entire society of AIs — and people — in your environment.

Suppose our task is to manage a fleet of taxis in a city. How do we develop a solution that coordinates the full fleet, minimising congestion and thus passengers’ dissatisfaction, as well as making the best use of available resources? To compound the problem, we’re not talking about a small fleet but thousands of taxis operating across a large city. In the MAS team, we start by considering the point of view of the single taxi-driver/agent who is making a full range of interlocking decisions - which way to turn, which route to take, which passenger to pick up, which shift to work and when to call it a day. Each decision, in turn, can affect all the other drivers in the fleet — and their passengers. Managing this fleet is never going to be the kind of problem we can solve with some single fantastic algorithm. It’s impossibly complex to solve as a single problem.

What our MAS team proposes is to put simpler intelligences into individual agents and design their interactions in such a way that enough intelligence emerges from the resulting system to solve a problem. Only then will our fleet be able to act as a unified whole. This distributed approach to intelligence is what AI pioneer Marvin Minsky famously called The Society of Mind when he argued in the 1980s that for both artificial and natural minds, intelligence — even human intelligence — emerges from the interaction of many simple thinking systems he called “agents”. This distributed approach allows us to solve two of the most troublesome problems in AI: scalability and robustness. It allows the system to scale continually in the number of added intelligent agents, so we can keep adding taxis as needed. Also, when an individual agent breaks down, the loss of a single taxi can be offset, and the system as a whole can carry on without disruption.

However, there’s a price to pay in a distributed system. In Game Theory such measure is coined as  “Price of Anarchy” (PoA), which measures how the efficiency of a system degrades due to the selfish behaviour of agents. Indeed, whenever you deal with a system of interacting, autonomous, rational intelligences trying to pursue their own goals, you have to tackle one big issue: selfishness. Though traffic is never fun, it can be usefully thought of as a competitive game. Any rational agent acts out of self-interest; in technical terms, it always tries to maximise its utility. For a driver in the simple example below, that means trying to minimise travel time between cities. Here the agent has two options, but it always selects the faster, upper route.

AI choosing the simple route

But if we look at the same problem with multiple decision-makers, things start to fall apart. Each agent wants to minimise its travel time, and — assuming there’s no prior communication about traffic — each makes an identical choice, opting for the quicker route. This obviously causes congestion, and everybody is negatively affected. The problem isn’t solved by most mapping apps, which tend to suggest similar routing solutions to all users, a problem compounded when accidents and diversions funnel drivers to the same detours, creating further congestion. 

Comparing Mobile Trip Maps

Here no one is thinking strategically about the competition between drivers for resources. Just as a mapping solution that doesn’t actually lower my travel time might be sending me to rival apps, a fleet solution that doesn't consider both the needs of individual drivers and the efficiency of the system as a whole will fail to survive in a very competitive industry. That's what we do here at we identify the problems that occur in complex systems and find solutions that help both individual agents and the environment as a whole, making the system greater than the sum of its parts.

Let's look at a more focused example: a similar situation that may seem counterintuitive but can shed light on the problem in terms of both infrastructure planning and traffic management.

Mountain 1

Here cars try to find the quickest route from a source city on the left to a destination on the right along roads that have differing capacities, with some parts of the journey prone to congestion depending on the numbers of users. Here the number of such users (n) stands for the time it takes to travel between the two points. How can users best use the infrastructure to reach their destinations? The solution here is what we call Nash equilibrium, named after Mathematician John Forbes Nash of “A Beautiful Mind” fame, one of the foundational figures of Game Theory.  In this solution, an equal distribution of the cars provides the best answer and leads to a travel time of 11 minutes, with only three cars going along each narrow road and thus avoiding congestion. So far so good.

Now imagine that planners decide to simplify things and further reduce travel time by building a superfast tunnel under the mountain - one that will reduce travel between the intermediate points to only 1 minute.

Mountain 2

What could possibly go wrong? If all our drivers do the perfectly rational thing and choose the new, faster route, travel time is extended to 13 minutes as traffic accumulates on the narrow roads beyond the tunnel. As we’ve said, it’s counterintuitive, but everybody loses because we have rational decision-makers who are only interested in maximising their own welfare. So that’s the bad news, and it’s the sort of thing that can happen when we program intelligent, rational, individual agents, especially in much more realistic and complex environments than this.


Imagine a similar problem in London, for example, where travel from a residential area to a business district can likewise get bottlenecked by the rational choices of thousands of drivers. Fortunately, by distributing intelligence,’s VUKU platform can tackle such problems on an urban or even a global scale, because the solution is independent of the number of decision-makers. This also allows us to use a subfield of game theory called Mechanism Design to provide a solution. Based on Nobel prize winning mathematical research, Mechanism Design is a subfield of Game Theory that aims to design rules that regulate actions both among agents and between agents and the environment. It can do this in ways that align the interests of individual agents with the goals of the incentive designer. Given a model of the agents’ behaviour and rationality, we can use Mechanism Design to help us first decide the nature of the incentives that can guide individual agent’s actions and then compute the precise amount of incentives that will make the best use of resources. Returning to our earlier example, it might suggest that we build a toll booth at the entrance of the tunnel and charge one pound for use of the tunnel route.

Mountain 3

However, the standard work in mechanism design assumes that agents are perfectly rational, that they have access to all the information they need and enough computational resources to compute and implement an optimal solution. This rarely applies in real life: in the sorts of complex systems serves - and even in a simplistic example like the one we’re describing here - it’s easy to see that these assumptions cannot hold and that we need a reality check. First of all, human agents are never perfectly rational; in economics, we say that they have “bounded” rationality. They can be unable to identify the fastest route and end up taking the slowest one. They may not have access to all the necessary information. They definitely don’t have unlimited computational capability: unlike a computer, a human can’t even plan one hundred moves ahead when playing chess.

Mechanism Design

We need to adapt the mechanism design approach to account for bounded rationality by developing a new incentive design framework that observes the actual, often irrational behaviour of agents and adjusts accordingly. In our simple example, we may observe that a one pound toll may change the behaviour of only one agent, and thus prove insufficient. Raising the toll to three pounds may overshoot the mark and cause too few agents to use the tunnel. But we can continue learning from observation until we find the solution of two pounds that evenly distributes traffic across the environment in an optimal way for everyone. Nash equilibrium is met, even in a complex environment with less-than-rational agents.

These sorts of divide and conquer approaches can be found throughout’s research and they allow us to make even very complex systems tractable and solvable. Next time we’ll see how Haitham and his team break down complex learning and modelling tasks to beat the current benchmarks in Reinforcement Learning.

Help us build AI that will change the world

join our team