Content

Speaker:

Deep Chakraborty

Abstract:

Human beings are known to learn rich representations of the visual world simply by being exposed to it and without the constant need for explicit supervision. Although the mechanisms underlying this ability are not fully understood, self-supervised learning (SSL) is an attempt to endow machines with a similar ability to learn from unlabeled data using proxy tasks. Although SSL has become a powerful paradigm for visual representation learning, important questions remain about whether the learned representations are sufficiently expressive for downstream tasks, how such representations can be improved after pre-training, and how the composition of the pre-training data itself affects their usefulness. This thesis studies these questions from three complementary perspectives—scientific applications, representation desiderata and methods for improvement, and data-curation principles for real-world scenarios.

First, we apply SSL to the task of automatic terrain categorization in Martian terrain images. Using deep self-supervised clustering, we show how neural networks can learn meaningful geomorphological and textural features from a large pool of unlabeled images, enabling the creation of a large dataset of Martian terrain images with scientifically relevant and granular terrain categories without the need for exhaustive expert annotation. We further validate the scientific significance of the discovered categories with expert input and create a detailed terrain taxonomy to encourage more fundamental analyses in the future.

Second, we propose a general-purpose information-theoretic criterion for improving the quality of already-trained, highly optimized SSL representations. By maximizing the entropy of SSL-trained embeddings in a compact space, we prepare them as well as possible for future, as-yet-unknown, discrimination tasks. Because traditional entropy maximization relies on unreliable high-dimensional estimates, our criterion is defined in terms of easy-to-estimate, low-dimensional constraints. We show that even without explicitly enforcing higher-dimensional constraints in our criterion, this approach yields practically useful embeddings and improves downstream performance with only a few additional training epochs.

Finally, we study how the composition of pre-training data shapes the usefulness of SSL representations for downstream tasks. Specifically, we examine the roles of the diversity and relevance of the pre-training images from the point of view of a downstream task and how they affect representation learning and final performance on that task. Through carefully controlled experiments that minimize confounding factors, we derive practical data-curation insights for SSL practitioners operating under limited compute budgets or limited access to in-domain data.

Advisor:

Erik Learned-Miller

Hybrid event posted in PhD Thesis Defense for Faculty