Variable Risk Policy Search for Dynamic Robot Control

24 Aug
Friday, 08/24/2012 6:00am to 8:00am
Ph.D. Thesis Defense

Scott Kuindersma

Computer Science Building, Room 151

A central goal of the robotics community is to develop general optimization algorithms for producing high-performance dynamic behaviors in robot systems. This goal is challenging because many robot control tasks are characterized by significant stochasticity, high-dimensionality, expensive evaluations, and unknown or unreliable system models. Despite these challenges, a range of algorithms exist for performing efficient optimization of parameterized control policies with respect to average cost criteria. However, other statistics of the cost may also be important. In particular, for many stochastic control problems, it can be advantageous to select policies based not only on their average cost, but also their variance (or risk).

In this thesis, I present new efficient global and local risk-sensitive stochastic optimization algorithms suitable for performing policy search in a wide variety of problems of interest to robotics researchers. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, learned cost models can be used as critics for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection leads to variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without requiring additional policy executions.

To evaluate these algorithms and highlight the importance of risk in dynamic control tasks, I describe several experiments with the UMass uBot-5 that include learning dynamic arm motions to stabilize after large impacts, lifting heavy objects while balancing, and developing safe fall bracing behaviors. The results of these experiments suggest that the ability to select policies based on risk-sensitive criteria can lead to greater flexibility in dynamic behavior generation. 

Advisors: Roderic A. Grupen & Andrew G. Barto