Segmentation of Independently Moving Objects in Video

21 May
Monday, 05/21/2018 10:00am to 12:00pm
Computer Science Building, Room 151
Ph.D. Dissertation Proposal Defense
Speaker: Pia Bideau

"Segmentation of Independently Moving Objects in Video"

"We see because we move; we move because we see" - James J. Gibson

The ability to recognize motion is one of the most important functions of our visual system. Motion allows us both to recognize objects and to get a better understanding of the 3D world we are moving in. Because of its fundamental importance, motion is used to answer a wide variety of fundamental questions in computer vision such as: (1) Which objects are moving independently in the world? (2) What is the scene geometry? (3) How is the camera moving?  In this work we mainly focus on the first question and develop a method to segment independently moving objects in videos in a fully automatic way.

There are a variety of video segmentation methods that take motion into account. Clustering based approaches cluster the observed motion based upon the motion's direction or speed. Other methods focus more on modeling motion correctly according to the physics of perspective geometry. Methods based solely on perspective projection are often limited due to the lack of known camera intrinsics such as focal length. New learning-based methods using convolutional neural networks profit from their strength in general object segmentation but do not take into account the physics of perspective projection. Rather than modeling motion accurately more general motion patterns are learned that are not dependent upon camera parameters. This has the advantage of being independent upon camera parameters and being able to learn very complex object motions that are hard to model due to their ambiguity, but on the other hand useful information about the geometry of the scene is not taken in to account.

In our work we present a probabilistic approach to analyse the motion in the image due to the observer and the motion of objects in the world. We combine (a) cues from perspective projection (b) probabilistic approaches to modeling object motion and (c ) a convolutional neural network to add "objectness" knowledge in a coherent way. First the observer's motion is estimated by maximizing a motion field likelihood. We then segment the frame into its independently moving objects guided by a convolutional neural network for object segmentation. We show that a careful analysis of the motion field not only leads to a consistent segmentation of moving objects in a video sequence, but also helps us understand the scene geometry of the world we are moving in.

Advisor: Erik Learned-Miller