by Felix Leibfried (Senior Machine Learning Researcher), Jordi Grau-Moya (Senior Machine Learning Researcher) and Haitham Bou Ammar (Reinforcement Learning team leader).
At PROWLER.io, we’re pushing the frontiers of AI by tailoring agent behaviour to the needs of our users. One powerful new tool we’re developing to accomplish this is Tuneable AI: a way to train agents that enables their “rationality” to be adjusted and controlled. It’s a telling example of how things work here at PROWLER.io. By bringing together our three core technologies – reinforcement learning, probabilistic modelling and game theory – we’ve managed to develop a very useful tool that is destined to become a core AI technology in environments like games, robotics, autonomous vehicles, financial markets and beyond.
In games, Tuneable AI is a defining moment for one of the industry’s biggest challenges: maintaining player interest. It reduces the need for pre-defined, discrete difficulty levels and will deliver a seamless gaming experience that hits the sweet spot for players: where a game is never so easy that it’s boring, nor so difficult that it’s frustrating. This is one of the best ways to maximise player engagement and minimise the churn out that can plague games with ongoing revenue streams, where player retention is key.
Enabling Tuneable AI in Single Agent Games:
Traditional game theory and current reinforcement learning techniques assume perfect rationality; this can make their adaptability to real-world decision-making problematic. Such unrealistic assumptions lead to unrealistic models, particularly in complex environments that involve human agents, like games. Human decision-making is rarely rational — and never perfectly so.
PROWLER.io has developed a practical solution to this widespread problem. By explicitly limiting an agent’s information-processing resources, we can effectively limit or “bound” its rationality in very useful ways. We accomplish this by giving agents a limited budget of resources to use in their search for behavioural policies that maximise rewards. This constraint translates into instantaneous penalty signals that force agents to find behavioural policies that trade-off maximising rewards against minimising penalties on available resources.
This trade-off is governed by a scalar parameter that weights the penalty signal and enables a continuous scale of learning outcomes, ranging from perfect rationality of the kind usually assumed by deep neural networks (zero penalty weight) to maximally bounded behaviour (infinite penalty weight). This penalty parameter effectively becomes an adjustable dial that can tune the rationality of a reinforcement learning agent.
While this can in principle be applied to any reinforcement learning problem whose bounded rationality can be simplified as a resource allocation issue, we think Atari games best demonstrate the scalability of our approach to high-dimensional environments. These games are particularly challenging since environmental states can only be inferred from video frames and optimal behaviour needs to be encoded by a deep neural network.
In the following video, we use Atari’s Roadrunner game to demonstrate our approach – Deep Information Networks (DIN) – in comparison to DQNs and improved Double DQNs from DeepMind, which are agnostic to limits in information processing.
The imposed constraints clearly lead to both increased performance and faster learning: our agent’s bounded rationality collects more rewards (the small blue dots) in less time, our DIN finishes the level much earlier than both DQNs and double DQNs.
Results showing the reduction in training iterations of our method compared to DeepMind’s techniques. Clearly, our method requires fewer training iterations to achieve superior behaviour.
Enabling Tuneable AI in Two-Player Games:
When we extend tuneable AI to two player games things get even more interesting. Now agents must interact with both the environment and another player, and they must model both their own and their opponent’s strategies. Both player’s rationality is now tuneable and we can shift the equilibrium point of the game by changing their respective rationalities. With this technique, we can craft a wide range of behaviours, ranging from extremely clumsy to very skilful play styles.
This is most useful when applied to AI vs human gameplay because it enables an AI to flexibly adapt to a human player’s abilities. By employing a maximum likelihood approach that uses a probabilistic model, an agent can now estimate a human’s bounded rationality and adjust its own accordingly.
This technique can thus continually, interactively adjust the game to be easier, more difficult or more balanced.
Results on learning the opponent’s rationality, showing average reward (left), Bellmann error (middle), and the value of the acquired rationality (right). The two values of the opponent’s rationality were set to 5 and -10. Clearly, our algorithm can efficiently learn these values and produce the relevant behaviour as shown by both reward curves.
The approach used in these simple Atari demonstrations is just as applicable in contemporary games and has broad adaptability to other domains. Instead of shoehorning players and users into narrow, pre-defined levels, designers will be able to use tools like Tuneable AI to automatically, continually personalize interactions with users.
In wider applications, the ability to model and imitate bounded rationality means that AI can more effectively be integrated with and support human decision-making. Paradoxically, Tuneable AI uses Artificial Intelligence to make gaming - and potentially any complex system - feel a lot less artificial.