Faculty Recruiting Support CICS

Reinforcement Learning for Non-Stationary Problems

07 Mar
Monday, 03/07/2022 2:00pm to 3:00pm
Hybrid - LGRC A311 and Zoom
PhD Dissertation Proposal Defense
Speaker: Yash Chandak

Abstract: Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems involving sequential decision-making. However, most RL methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. Unfortunately, in many real-world applications this assumption is violated, and so using existing algorithms may result in a performance lag or false safety guarantees. For example, personalized automated healthcare systems and other human-computer interaction systems need to constantly account for the changes in human behavior and physiology over time.  In my work, I will discuss methods that can (a) proactively search for a good future policy, and (b) do so in a safe and robust manner under the presence of structured non-stationarity.

To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm automatically re-weights past data non-uniformly, and we observe that minimizing performance over some of the data from the past can be beneficial when searching for a policy that maximizes future performance. 

While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However,  many real-world problems exhibit non-stationarity and involve critical systems with both financial and human risks. When stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of counterfactual reasoning and reinforcement learning with time-series analysis.

 

Advisor: Philip Thomas

JOIN VIA ZOOM