Monte-Carlo Tree Search for Constrained POMDPs

back to our research

Monte-Carlo Tree Search for Constrained POMDPs

December 2 - 8, 2018 NeurIPS, Montreal, Canada

Authors: Jongmin Lee (KAIST), Geon-hyeong Kim (KAIST), Pascal Poupart (University of Waterloo), and Kee-Eung Kim (KAIST & PROWLER.io)

Abstract: Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

Reinforcement Learning

NeurIPS

Monte-Carlo Tree Search

Constrained Optimisation


See paper

Join us to make AI that will change the world

join our team