Non-convex Policy Search Using Variational Inequalities

back to our research

Non-convex Policy Search Using Variational Inequalities

Published October 2017

Journal of Neural Computation, Volume 29, Issue 10, MIT Press, Oct. 2017

Authors: Yusen Zhan, Haitham Bou-Ammar (PROWLER.io), and Matthew E. Taylor 

Abstract: Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies.

These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

Non-Convexity

Optimisation

Reinforcement Learning


See paper