Policy Gradient Methods: Analysis, Misconceptions, and Improvements

24 Mar

Friday, 03/24/2023 1:30pm to 3:00pm

LGRC A215

PhD Dissertation Proposal Defense

Abstract: Policy gradient methods are a class of reinforcement learning algorithms that optimize a parametric policy by maximizing an objective function which directly measures the performance of the policy. Despite being used in many high-profile applications of reinforcement learning, the conventional use of policy gradient methods in practice deviates from existing theory. This thesis presents a comprehensive mathematical analysis of policy gradient methods, uncovering misconceptions and suggesting novel solutions to improve their performance.

We first demonstrate that the update rule used by most policy gradient methods does not correspond to the gradient of any objective function due to the way the discount factor is applied, leading to suboptimal convergence. Subsequently, we show that even when this is taken into account, existing policy gradient algorithms are suboptimal in that they fail to eliminate several sources of variance. To address the first issue, we show that by gradually increasing the discount factor at a particular rate, we can restore the optimal convergence of policy gradient methods. To further address the issue of high variance, we propose a new value function called the posterior value function. This function leverages additional information from later in trajectories that was previously thought to introduce bias. With this function, we construct a new stochastic estimator that eliminates several sources of variance present in most policy gradient methods.

Advisor: Philip Thomas

Policy Gradient Methods: Analysis, Misconceptions, and Improvements

Subscribe to the CICS eNewsletter