Social AI: a principled decision making paradigm

back to our blogs

Social AI: a principled decision making paradigm

Enrique Munoz De Cote

By Enrique Muñoz de Cote

Intelligent agents are all around us: under Netflix’s movie suggestions; behind Siri’s weather forecasts; inside car suspensions and support systems; above us in air traffic control systems. But they’re isolated; most can only work in narrow settings that don't require interaction with other agents. Such simple scenarios are increasingly rare. In the near future, we’ll be tackling complex interactive problems that require Social AI – a powerful new toolset that prescribes how agents interact.

Autonomous cars that once could only navigate in carefully controlled environments are evolving. They’re moving into dynamic open spaces with many moving parts, like other cars, pedestrians or bicycles. Social AIs will be needed to cope with the ensuing complexity. Similarly, with so many startups developing intelligent devices, the Internet of things (IoT) will need Social AI to get them interacting effectively.

At we’re working on making IoT nodes intelligent enough to coordinate with their peers and reach consensus without needing to defer to a central authority. Not only will this empower agents to do simple jobs like identifying faulty sensors on their own, it will allow systems to scale up to much more complex challenges like coordinating a fleet of twenty thousand taxis. Agents will not only be able to coordinate with familiar peers, they’ll understand the intentions of very different agents from other suppliers and negotiate with them over individual and common problems.

Social AI will enhance multi-agent systems (MAS) – which focus on issues associated with societies of self-interested agents (Wooldridge, 2009) – so that they can be used in situations that go beyond the well aligned, common goal team games that are currently handled by Distributed AI (DAI). This new framework can not only resolve problems with no common goals but can help in situations where nothing can be assumed about agents' intentions in the first place. And while designing a DAI solution is usually a daunting exercise, using a multi-agent framework enables designers to break the task down into easier subtasks that can be assigned and handled by each agent independently.

That’s the easy part. Orchestrating all the parts - getting them to work seamlessly together - will require more advanced techniques. That’s where the theory of strategic thinking, game theory, comes in.

Game theory (GT) uses mathematical models of conflict and cooperation between self-interested rational decision makers. The GT framework allows systems designers to tie strategies that benefit individual agents to ones that benefit the system as a whole. This requires coordination, cooperation and even competition to work. But is it possible for agents to coordinate or cooperate with agents that don’t share common goals?

Coordination and cooperation

Coordination is the process by which two or more agents engage in order to ensure coherent system (team) behaviour. It’s useful for several reasons:

  • it satisfies global system constraints;
  • it establishes dependencies between agents' actions;
  • it allows for concurrent decision-making processes;
  • it allows for sub-task (hierarchical) problem-solving.

Cooperation and coordination are emergent properties of multi-agent interactions, but they will only emerge if they help agents achieve their own individual goals. That said, a subfield of game theory – mechanism design – can help by providing incentives for agents to align with common goals. It’s like making targeted payments to foster desired behaviours, such as cooperation.

These techniques help us design autonomous AI agents that can truly interact with humans and other forms of AI. Let’s look at some examples.

Supply and demand as a multi-agent problem

Suppose we have on the one hand a fleet of taxi drivers (the supply) who are self-interested rational decision-makers, each with individual objectives. On the other hand, we have city dwellers requesting rides (the demand). At any given time, there are a number of calls requesting travel from a pickup to a drop-off location. This is the scenario that companies like Uber, Lyft and Grab face. It’s a classic example of how individual objectives (of the drivers) might not be well aligned with system objectives (of the company). The consequence of this misalignment is well captured by a measure known in game theory as “the price of anarchy” (PoA) (Koutsoupias & Papadimitriou, 2009). This defines the effectiveness of the system as the ratio between the worst possible Nash equilibrium and the social optimum. In other words, it’s the price to be paid (in terms of system efficacy) for letting self-interested agents decide on their own what to do.

Traffic, a phenomenon that results from selfish use of common resources, is another example of strategic conflict known in the GT literature as a “selfish routing problem”. In this case, since drivers seek to shorten their commutes, fast roads get overused because it’s in everyone’s self-interest to use them. This can produce interestingly counter-intuitive problems – and solutions. Braess' paradox is my favourite example of this: it states that building new roads can increase traffic congestion and closing existing ones can decrease it.

Both the selfish taxi assignment and traffic are cases of selfish routing that belong to the game theoretic class of problems known as the tragedy of the commons. Hundreds of papers have studied such problems, notably with the Prisoner’s Dilemma game. When this type of problem is fed into classic GT frameworks, the theory generally predicts a high PoA solution that is detrimental from a system point of view – all the players look out for their own interests and the system as a whole suffers. Interestingly, in behavioural game theory – which studies interactions between humans – people generally find ways to avoid such PoA breakdowns of the system. By adapting human strategies, which in classic GT are often dismissed as mere preferences or irrational, we are finding ways to develop low PoA solutions for machine-only and mixed human-machine settings.

Machines learning with game theory

Our team at is developing autonomous systems and seeks to avoid imposing behaviours on agents. If machines are going to help minimise the PoA curse, they’re going to need to learn on their own how to do so. Rather than being given algorithms that make assumptions about rationality and preferences, they’ll learn to make optimal decisions (called “best responses" in game theory) by interacting with each other and considering their counterparts in a game theoretic way.

Social Ai Post Graphic

But foregoing assumptions about rationality and preferences in order to learn long-term plans in a shared environment comes at a price. Since an agent must learn how to act while considering how other agents do the same, each agent’s reinforcement learning (RL) process can become entangled with all the other RL processes. The result can be cyclic, even chaotic, learning dynamics (Lazaric, Munoz de Cote, Dercole, & Restelli, 2007). Some scientists have tried to address this issue head-on by designing algorithms that can adjust to different conditions by comparing their performance to a baseline (Bowling & Veloso, 2002; Hernandez-Leal, Zhan, Taylor, Sucar, & Munoz de Cote, 2016) or that can themselves induce stationary conditions. (Munoz de Cote, Lazaric, & Restelli, 2006) Others try to develop solutions tailored to specific settings, like zero-sum (Littman, 2001), team or potential games. We believe probabilistic models can make a big difference here, by enabling agents to model other agents' behaviour and preferences.

Social AI is transparent AI

A MAS framework is one of our preferred decision-making techniques because it helps us to see and understand solutions found by the algorithms. It enables us, for example, to break a task down into more easily understood subtasks. And when something changes in the environment that impacts only a fraction of the agents, only those agents need to adapt. This contrasts with some current approaches that use data-hungry techniques (like deep learning) where large networks need to be retrained from scratch even after a small environmental change. Furthermore, because one can always observe how the pieces interact, the MAS framework is more transparent. We can easily query it with questions like: Which agents are interacting? What are they communicating? Are their solutions coordinated?

What’s next?

Some of the upcoming challenges for social AI are engineering challenges like protocols and standardisation. But the field is young; since algorithms deviate from classic GT thinking, they raise some profound scientific questions. How is cooperation between agents possible when goals are not aligned? How should single-agent plans be coordinated? What information needs to be shared so agents can efficiently coordinate? What incentives and rules need to be put in place to align agent goals? What would equilibrium solutions look like? Do we need new conceptual solutions?

These questions and many more have yet to be fully addressed. We’ve only explored the tip of the iceberg. Fortunately, game theory has a wealth of literature and diverse ideas. Plenty of useful answers lie below the surface.


Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence136(2), 215–250.

Hernandez-Leal, P., Zhan, Y., Taylor, M. E., Sucar, L. E., & Munoz de Cote, E. (2016). An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems, 1–32.

Koutsoupias, E., & Papadimitriou, C. (2009). Worst-case equilibria. Computer Science Review3(2), 65–69.

Lazaric, A., Munoz de Cote, E., Dercole, F., & Restelli, M. (2008). Bifurcation Analysis of Reinforcement Learning Agents in the Selten’s Horse Game. Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning (Vol. 4865, pp. 129–144).

Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum Games. International Conference on Machine Learning.

Munoz de Cote, E., Lazaric, A., & Restelli, M. (2006). Learning to cooperate in multi-agent social dilemmas. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems (AAMAS) (pp. 783–785). New York, NY, USA: ACM.

Wooldridge, M. (2009). An Introduction to MultiAgent Systemswiley. John Wiley & Sons.

Help us build AI that will change the world

join our team