Zoom Meeting: https://umass-amherst.zoom.us/j/99502935840
Meeting ID: 995 0293 5840
Recent advances in computer vision are in part due to the widespread use of deep neural networks trained on massive labeled datasets. However, this can be a bottleneck for many applications. This thesis proposes several approaches to mitigate this problem enabling learning from limited supervision.
We first present ways for improving transfer learning to heterogeneous modalities by exploiting computer graphics techniques to generate aligned cross-modal data. By forcing the agreement of predictions across modalities, we show that more accurate models can be trained for analyzing line-drawings, grayscale or low-resolution images, or even 3D data represented as voxels or point clouds.
We then analyze how to improve few-shot learning by exploiting unlabeled data. We show that the performance on tasks can be boosted by combining unsupervised objectives with meta-learning objectives. We find that small amounts of domain-specific data can be more beneficial to a task and propose a technique to select such data.
The lack of realistic benchmarks in the literature had led to the study of transfer learning, unsupervised learning, and few-shot learning in isolation. In the last part of the thesis, we propose a few-shot learning benchmark that exposes some of the challenges often encountered in practice and analyze the effectiveness of existing techniques in the literature. We will summarize these experiments along with insights from two recent Kaggle challenges we ran at CVPR.
Advisor: Subhransu Maji