Faculty Recruiting

# Exploiting Structures in Interactive Decision Making

28 Sep
Wednesday, 09/28/2022 3:00pm to 5:00pm
Zoom
PhD Dissertation Proposal Defense
Speaker: Tongyi Cao

In this thesis we study several problems in interactive decision making. Interactive decision making plays an important role in many applications such as online advertisement and autonomous driving. Two classical problems are  multi-armed bandits and reinforcement learning. Here and more broadly, the central challenge is the \emph{exploration-exploitation} tradeoff, whereby the agent must decide whether to explore uncertain actions that could potentially bring high reward or to stick to the known good actions. Resolving this challenge is particularly difficult in settings with large or continuous state and action spaces. These structured settings are the focus of this thesis.

First we study the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\Vcal \subset 2^{[K]}$ of these distributions, and we would like to find the subset $v \in \Vcal$ that has largest mean, while collecting, in a sequential fashion, as few samples from the distributions as possible.  We develop new algorithms with strong statistical and computational guarantees by leveraging precise concentration-of-measure arguments and a reduction to linear programming.

Second we study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. Our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.

Finally, we study a setting in reinforcement learning where actions only impact a part of the state (endogenous part) and the other part of the state (exogenous part) evolves independently. This formulation is relevant for applications in data center management and resource allocation, where the endogenous state space is typically combinatorial in nature. We plan to explore the design space of reinforcement learning algorithms for this setting and develop statistically efficient algorithms.