PhD Thesis Defense: Khoshrav Doctor, Learning Structure to Support Autonomous Control Decisions
Content
Speaker:
Abstract:
There is a great deal of consequential research on autonomous systems with applications ranging from space exploration to self-driving vehicles. These systems frequently make critical control decisions in the absence of complete state information that stems from noisy sensors, actuators and environmental dynamics. Intelligent biological agents can acquire and exploit background knowledge summarizing past experience of situated control dynamics to help address this challenge. Probabilistic distributions are attractive representations for representing such knowledge and thus, the behavior of systems with incomplete information. Probabilistic structure in the environment is also useful for model-based learning and planning systems. It supports agents (biological or otherwise) in sequential decision making and active information gathering to reduce uncertainty while achieving desirable states and avoiding unrecoverable failure. The desire to use models for decision making brings about questions about what these models should represent and how they can be acquired autonomously.
In embodied systems, structure can be revealed by reliable patterns of flow in the control transitions. This dissertation examines techniques for learning structure and discusses how the resulting background knowledge is used in decision making under uncertainty. The goal is to provide a means of learning controllable interactions autonomously that can be applied in general to all learning and planning systems formulated as a Partially Observable Markov Decision Process (POMDP) and to measure the impact of the acquired models on representative learning/planning tasks. It aims to support the learning of models that predict interaction dynamics from recognizable environmental structure—a form of cognitive artifact that describes probabilistic roadmaps in state-action spaces.
This dissertation considers learning a model of the transition function $T$, that provides a probability distribution over outcome states that occur when an action is performed from a given state. These models support the agent in reasoning over trajectories of (either open or closed loop) controlled state transitions in order to solve a task. I study mechanisms to learn these models through autonomous exploration in a manner that addresses issues of salience, efficiency, coverage and completeness. The effects of the resulting models are measured on the quality of autonomous behavior using a POMDP framework for model-based planning in solving the Kidnapped Robot Problem using a sparse feature space and on an object recognition problem---both these tasks are examples wherein an agent is confounded with incomplete information and must rely on actively picking actions to mitigate uncertainty based on prior models.
Advisor:
Roderic Grupen