Faculty Recruiting Support CICS

Improving Reinforcement Learning Techniques by Leveraging Prior Experience

21 Nov
Thursday, 11/21/2019 1:00pm to 3:00pm
PhD Thesis Defense


In this dissertation, we develop techniques to leverage prior knowledge for improving the learning speed of existing reinforcement learning (RL) algorithms. RL systems can be expensive to train, which limits its applicability when a large number of agents need to be trained to solve a large number of tasks; a situation that often occurs in industry and is often ignored in the RL literature. In this thesis, we develop three methods to leverage the experience obtained from solving a small number of tasks to improve an agent's ability to learn on new tasks the agent might face in the future.

First, we propose using compression algorithms to identify macros that are likely to be generated by an optimal policy. Because compression techniques identify sequences that occur frequently, they can be used to identify action patterns that are often required to solve a task.

Second, we address some of the limitations present in the first method by formalizing an optimization problem that allows an agent to learn a set of options that are appropriate for the tasks. Specifically, we propose an analogous objective to compression by minimizing the number of decisions an agent has to make to generate the observed optimal behavior. This technique also addresses a question that is often ignored in the option literature: how many options are needed?

Finally, we show that prior experience can also be leveraged to address the exploration-exploitation dilemma; a central problem in RL. We propose a framework in which a small number of tasks are used to train a meta-agent on how to explore. After being trained, any agent facing a new task can query the meta-agent on what action it should take for exploration.

We show empirically that, when facing a large number of tasks, leveraging prior experience can be an effective way of improving existing reinforcement learning techniques. At present, the application of RL in the industry setting remains rather limited. One of the reasons being how costly and time-consuming training large scale systems can be. We hope this thesis provides some guidance for future work, and that it inspires new research in exploiting existing knowledge to make RL a practical alternative to tackling large scale real-world problems.

Advisor: Phil Thomas