Faculty Recruiting Support CICS

Leveraging continuity of videos for detection and clustering

22 May
Wednesday, 05/22/2019 1:00pm to 3:00pm
CS 140
Ph.D. Dissertation Proposal Defense
Speaker: SouYoung jin

Abstract:

State of the art deep learning based approaches for face recognition/object detection often require large amount of annotated data for supervised/semi-supervised training. In the applications of video understanding, it is more useful to use videos as training data as videos contain richer information than images. However, since annotating videos is much more difficult than labeling on images, unlabeled videos are often excluded during training procedures for various tasks. In this thesis, we study how we can leverage information from unlabeled videos using temporal continuity. Specifically, we present a fully automatic approach to group objects in videos and a method to automatically mine hard examples in videos to improve the performance of the existing object detectors.

We first explore an unsupervised approach for object clustering, where the output clusters can be ideally used as pseudo-labels. Given an unlabeled video, our goal is to group all faces in a video by identity. Specifically, we introduce a novel clustering method, motivated by the classic graph theory results of Erdos and Renyi. It is based on the observations that large clusters can be fully connected by joining just a small fraction of their point pairs, while just a single connection between two different people can lead to poor clustering results.

We introduce a novel verification method that has this property, and use it in the clustering scheme. We present state-of-the-art results on multiple video data sets and also on standard face databases.

To further enhance generating pseudo-labels, we demonstrate a method to automatically mine hard examples using temporal continuity in videos. In particular, we analyze the output of a trained detector on video sequences and mine detections that are isolated in time, which is likely to be hard examples. Our experiments show that retraining detectors on these automatically obtained examples often significantly improves performance. We present experiments on multiple architectures and multiple data sets, including face detection, pedestrian detection and other object categories.

Advisor: Erik Learned-Miller