Exploiting Concepts In Videos For Video Event Detection

12 Jun
Friday, 06/12/2015 6:00am to 8:00am
Ph.D. Seminar

Ethem Can

Computer Science Building, Room 151

Video event detection is the task of searching videos for events of interest to a user where an event is a complex activity occurring at a specific place and time. The video event detection problem has gained more importance as the amount of online video has been increasing by more than 300 hours every minute only on Youtube.

In this thesis, we tackle three major video event detection problems: video event detection with exemplars (VED-ex), where a large number of example videos are associated with queries; video event detection with few exemplars, in which only a small number of example videos are associated with queries; and zero- shot video event detection (VED-zero), where no exemplar videos are associated with queries.

We first define a new way of describing videos concisely, one that is built around using query-independent concepts (e.g., a fixed set of concepts for all queries) with a space-efficient representation. Using query-independent concepts enables us to learn a retrieval model for any query without requiring a new set of concepts. Our space-efficient representation helps reduce the amount of time required to train/test a retrieval model and amount of space to store video representations on disk.

When the number of example videos associated with queries decreases, the retrieval accuracy decreases as well. We present a method that incorporates multiple one-exemplar models into video event detection aiming at improving retrieval accuracies when there are few exemplars available. By incorporating multiple one-exemplar models into video event detection with few exemplars, we are able to obtain significant improvements in terms of mean average precision compared to the case of a monolithic model.

Having no exemplar videos associated with queries makes the video event detection problem more challenging as we cannot train a retrieval model using example videos. It is also more realistic since compiling a number of example videos might be costly. We tackle this problem by providing a new and effective zero-shot video event detection model that exploits dependencies of concepts in videos. Our dependency work uses a Markov Random Field (MRF) based retrieval model and assumes three dependency settings: 1) full independence, where each concept is considered independently; 2) spatial dependence, where the co-occurrence of two concepts in the same video frame is treated as important; and 3) temporal dependence, where having concepts co-occur in consecutive frames is treated as important. Our MRF based retrieval model improves retrieval accuracies significantly compared to the common bag-of-concepts approach with an independence assumption.

Advisors: James Allan & Manmatha