Faculty Recruiting Support CICS

Reinforcement Learning for Non-Stationary Problems

27 Apr
Wednesday, 04/27/2022 2:00pm to 3:00pm
Zoom
PhD Thesis Defense
Speaker: Yash Chandak

Abstract: Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems involving sequential decision-making. However, most RL methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). For example, applications like personalized automated healthcare, online education, product recommendations, and in fact almost all human-computer interaction systems need to not only account for the continually drifting behavior of the user demographic, but also how the preferences of users may change due to interactions with the system. 

 

For the settings where the problem only exhibits passive non-stationarity, I will first present methods that can (a) proactively search for a good future policy, and (b) do so in a safe and robust manner in the presence of structured changes, without ever requiring the agent to model the environment. Extending these to settings with active or hybrid non-stationarities poses additional challenges as such non-stationarities may make outcomes of future decisions dependent on all the past interactions, thereby resulting in effectively a single lifelong sequence of interactions. To overcome this challenge, I will present the first steps towards addressing the fundamental problem of (off) policy evaluation using a generalized procedure that can account for structured changes due to any of the active, passive, or hybrid non-stationarities. The proposed approaches are based on merging reinforcement learning and counterfactual reasoning with time-series analysis. 

Join via Zoom

Advisor: Philip Thomas