Faculty Recruiting Support CICS

Improving Visual Recognition with Unlabeled Data

24 Jan
Friday, 01/24/2020 1:00pm to 3:00pm
A311 LGRC
Ph.D. Thesis Defense

Abstract:

The success of deep neural networks has resulted in computer vision systems that obtain high accuracy on a wide variety of tasks such as image classification, object detection, semantic segmentation, etc. However, most state-of-the-art vision systems are dependent upon large amounts of labeled training data, which is not a scalable solution in the long run. This work focuses on improving existing models for visual object recognition and detection without being dependent on such large-scale human-annotated data.

We first show how large numbers of hard examples (cases where an existing model makes a mistake) can be obtained automatically from unlabeled video sequences by exploiting temporal consistency cues in the output of a pre-trained object detector. These examples can strongly influence a model's parameters when the network is re-trained to correct them, resulting in improved performance on several object detection tasks.

Next, we focus on the unsupervised adaptation of an existing object detector to a new domain with no labeled data, assuming that a large number of unlabeled videos are readily available. A modified knowledge distillation loss is proposed for re-training the original model on automatically-obtained hard and easy examples on the target domain. Our approach is evaluated on challenging face and pedestrian detection tasks involving large domain shifts, showing improved performance with minimal dependence on hyper-parameters.

Finally, we address the problem of face recognition, which has achieved high accuracy by employing deep neural networks trained on massive labeled datasets. Further improvements through supervised learning require significantly larger datasets and hence massive annotation efforts. We improve upon the performance of face recognition models trained on large-scale labeled datasets by using unlabeled faces as additional training data. We present insights and recipes for training deep face recognition models with labeled and unlabeled data at scale, addressing real-world challenges such as overlapping identities between the labeled and unlabeled datasets, as well as label noise introduced by clustering errors.

Committee:

Erik Learned-Miller, Chair

Subhransu Maji, Member

Liangliang Cao, Member

David Huber, Outside Member