Computer says you need an umbrella

Probabilistic Models for Complex Decision-Making

back to our blogs

Computer says you need an umbrella

Probabilistic Models for Complex Decision-Making

The day-to-day decisions we make might appear simple, even trivial, says James Hensman, Head of Probabilistic Modelling at But the mathematical algorithms behind them are more complex; in this blog post he unravels some of the secrets that make’s approach to AI work.

"Decision Theory is trivial, apart from the computational details."
David Mckay

By opening his 2003 book chapter on Information Theory with this provocative statement, Sir David McKay might have been anticipating the creation of thirteen years later. Though his tone was tongue-in-cheek, he was making a crucial point: computational details are the central problem in decision-making. Those details — things like statistical analysis, predictive analytics and multi-agent theory — are what we do at

What David meant when he said decision theory is trivial was that it requires just one relatively simple equation:

Argmax 2

Here we have some actions (a), some possible outcomes (x) and a function that defines how much I like those outcomes; this can be thought of as the “Utility”, “Reward” or “Loss”, or even simply the “liking” function. It’s a computation, for instance, of how much I like money, or how much a mouse likes cheese or fears mousetraps.

Altogether, the equation tells us how under a certain probability distribution of the outcomes, when we account for how much we desire an outcome, we can pick a decision that does best on average.  

Let’s try a very basic, very British, example. Should I take an umbrella tomorrow?

Rain 1

My decision is “should I take an umbrella or not?”; and the numbers in the right-hand column represent my reward or utility function. If it doesn’t rain tomorrow and I haven’t taken the umbrella, then I get a +1. If I take the umbrella and it doesn’t rain, I’m mildly inconvenienced (0), likewise if it does rain, and I do take my umbrella (0). At the bottom is my worst outcome: it rains, I don’t take the umbrella, and I get soaked. (-1)

But the details behind how we model and make predictions about the weather can be immensely complex. We invest an awful lot of money and time putting up satellites and developing algorithms to find out if, when and where it will rain, and the resulting forecast gives us an increasingly useful and precise probabilistic forecast of rain in the next week, day or hour. But the decision itself is easy to make: I multiply the function about how much I like the outcome by the probability of the outcomes, and then I decide and take the action. The decision space is really small, indeed trivial: I’ve only got one bit to flip: take an umbrella or not.

Machine Learning is good at predicting such simple things. Even if the prediction process itself is difficult, Machine Learning works exceptionally well when you’re deciding something very small. My favourite example is Apple’s face unlock technology: hold the iPhone in front of your face, and soon it will work out whether it should unlock itself. It’s a very small action space, but behind it is a very difficult thing to predict.

However, if you want to make decisions in big, complex, uncertain environments, you’re going to need probabilistic modelling. When the action space is large, and the interactions of all the things within it are much more complicated, you need to use probabilities to address the ensuing uncertainty.

How is Probabilistic Modelling different from other machine learning algorithms?

Pm 1

Most Machine Learning methods are basically number mappers. You input some numbers to a box, run them through some parameters in the middle and then push the resulting numbers out the other end. Machine learning is all about adapting the parameters in the middle so that the relationship between what goes in and what comes out is satisfactory. For example, X could be some image and Y could be some label or tag; if the tag is the right one for the image, the parameters in the middle are doing their job.

But to forecast something much more complicated, to model a whole set of points in a complex system, we need to keep a distribution of all the plausible parameters in a probabilistic model. We get examples of Xs and Ys and ask: what parameters can plausibly explain the relations between them?  It’s computationally expensive; instead of having a single set of parameters to twiddle with, we have to keep a distribution of all the sets of parameters that might explain a reasonable mapping. For example, to forecast where in a city people are going to want pizza delivery tomorrow, you’re going to need more than just a number, you’ll need a statistical distribution. Or if you’re prospecting for oil and trying to decide where to drill, you’re going to need statistics about the location, type and amount of oil nearby.

Pm 2

Why is interested in these probabilistic models and what sorts of things would we like to forecast? The potential applications are essentially unlimited. The sorts of spatiotemporal patterns we need to understand abound in financial markets, supply chains and logistics. In smart cities, for instance, we will need to be able to forecast what happens in much the same way as the Met office forecasts the weather. If the forecast calls for rain, how will that impact how much people want a service? How will traffic impact demand? How will the weather and traffic interact? The companies that can optimise the provision of their services – that can understand in advance the outcomes of their decisions — will be the only ones that survive and thrive in a smart economy.

Let’s have a look at a couple of technologies we’ve developed at to tackle these sorts for problems.

This is a model we’ve built using a large open source dataset from the city of Porto. It’s a map of downtown Porto on top of which we’ve overlaid a heat map of demand for taxi services, which was output from a probabilistic model on our VUKU️ platform. The brighter patches show where and when we expect people to request taxis. We can see how demand evolves throughout the day using the clock on the bottom left corner of the screen. The white markers that pop up indicate where an actual taxi request was made. The more people requests taxis from a location, the brighter that part of the map becomes. This simple model forecasts the hotspots for demand in a dynamic way and allows drivers and dispatchers to predict and anticipate demand, bring down waiting times and beat the competition.

The same data can be used to predict some interesting details for each forecast journey.

In this model, we can forecast the probable destinations for taxi requests made from specific locations. As we click on different locations on the map, we change the hypothetical starting point of a taxi journey and generate a heat map forecast of the probable destinations. By adjusting the clock, we can then estimate where a taxi request made from a particular place and time is likely to end up. This allows us to optimise dispatches and further lower waiting times for both drivers and passengers.

This can’t be done with a mere number mapping system, because we can’t know well in advance exactly where and when taxi journeys will happen; we just don’t have the data. There’s always missing data in this sort of complex system, and we have to mitigate the resulting uncertainty by using probabilistic modelling.

Anybody who is thinking of providing services in a city, along a supply chain or through a port is going to need predictive analytics and probabilistic models. Decision theory may start from a single equation, but behind that equation, there’s a lot of computational detail that can be handled by the PM team — as well as by our colleagues in Reinforcement Learning and Multi-Agent Systems. We’re working together on new solutions — like model-based reinforcement learning and Tuneable AI — that are already solving some of the biggest challenges in Decision-Making AI.

Join us to make AI that will change the world

join our team