Acquiring Rich Models of Objects and Space Through Vision and Natural Language

06 Feb
Wednesday, 02/06/2013 11:00am to 12:00pm

Matthew Walter
Massachusetts Institute of Technology
Computer Science and Artificial Intelligence Laboratory (CSAIL)

Computer Science Building, Room 151

Faculty Host: Rod Grupen

Over the past decade, we have seen robots move off factory floors and into unstructured environments. The age of robots operating in controlled isolation is giving way to a new generation of agile robots that assist people in broad areas that include sub-sea exploration, material handling, and healthcare. Recent advances in navigation, estimation, planning, and control enable robots to function under greater degrees of uncertainty. However, our means of controlling these robots is currently limited to low-level teleoperation. The challenge we face today is to make robots that people can command naturally, in unstructured settings. This capability requires that robots be situationally aware - able to form spatially extended, temporally persistent models of their surround at a level similar to their human partners.

This talk describes recent advances in robot situational awareness that enables robots to acquire rich models of objects and of their environment from natural interactions with their human partners. First, I describe a perception algorithm that allows mobile robots to build and maintain a representation of objects in their surround using only image sequences and estimates of the robot's inter-frame motion. The algorithm robustly detects these learned objects over time despite the challenges of uncontrollable lighting and viewpoint variation typical of unprepared environments. The novelty of the algorithm is its opportunistic use of the robot's motion to capture varying object appearance. Unlike existing vision-based object detection strategies that require extensive offline training, our algorithm builds this catalogue from a single training example. This capability enables richer command and control mechanisms that include the use of spoken natural language to command a 3000 kg mobile manipulator. Second, I describe an algorithm that enables robots to efficiently learn human-centric models of their environment from natural language descriptions. Underlying the algorithm is a probabilistic model that provides a common framework in which to integrate concepts from natural language descriptions with the metric information that the robot's low-level sensor streams convey. I present an efficient means for performing inference over this model and demonstrate that it results in environment models that are more accurate metrically, topologically, and semantically.

A reception will be held at 3:40 in the atrium, outside the presentation room.