Advances in Segmentation for Video Understanding

20 Feb
Wednesday, 02/20/2013 11:00am to 12:00pm

Jason Corso
SUNY at Buffalo
Computer Science and Engineering

Computer Science Building, Room 151

Faculty Host: Erik Learned-Miller

Video understanding is at the forefront of modern computer vision. The lay and technical communities alike are drowning with videos---e.g., YouTube reports 72 hours of video uploaded each minute. Transforming the videos into usable form, such as video-to-text, is paramount to making good use of this rich data. However, the computer vision community has struggled with an appropriate representation on which to base the video analysis methods.  Most modern techniques are based on low-level features, which have little or no semantic interpretation and depend on large annotated data sets to perform well. In contrast, video segmentation as an early processing step presents an complementary, more semantically rich, low-level representation on which to base further processing. Yet, the adoption of segmentation in video has lagged behind that of segmentation in images, likely due to a lack of critical analysis of video segmentation methods and no methods that can perform well on long video streams.

In this talk, I will present the recent work in my group that addresses these limitations of video segmentation as an early step in video understanding. The first part of the talk will discuss an approximation framework for streaming hierarchical video segmentation, which bounds the required memory (to a small constant) for processing and retains the high quality performance of the full-video segmentation method. Second, I will present our metric learning work that can be used to learn the distance function between pixels in the video and segments in the hierarchy rather than relying on an ad hoc distance. I will also discuss a new criterion for flattening the video segmentation hierarchy based on the notion of uniform motion entropy, which helps avoid a representational explosion by using the whole hierarchy for later processing. Finally, I will conclude with video-to-text examples that demonstrate the strong potential of using a more semantically rich low-level representation, and a vision for where I see video understanding going in the next decade.


Corso is an assistant professor in the Computer Science and Engineering Department of SUNY at Buffalo. He received his Ph.D. in Computer Science at The Johns Hopkins University in 2005. From 2005-2007, Corso was a post-doctoral research fellow in neuro-imaging and statistics at the University of California, Los Angeles. He is the recipient of the Army Research Office Young Investigator Award 2010, NSF CAREER award 2009, SUNY Buffalo Young Investigator Award 2011, a member of the 2009 DARPA Computer Science Study Group, and a recipient of the Link Foundation Fellowship in Advanced Simulation and Training 2004. He holds the Associate Editor position of Computer Methods and Programs in Biomedicine since 2009. Corso has authored more than sixty papers on topics of his research interest including computer vision, medical imaging, robotics, computational biomedicine, machine intelligence, statistical learning, perceptual interfaces and smart environments.  He is PI on more than $5 million in research funding from major federal agencies, including NSF, NIH, DARPA, ARO, and IARPA.

A reception will be held at 3:40 in the atrium, outside the presentation room.