PhD Dissertation Proposal: Deep Chakraborty, Information-Theoretic Methods for Understanding and Improving Representations in Neural Networks
Content
Speaker:
Deep Chakraborty
Abstract:
Human beings are known to learn rich representations of the visual world simply by being exposed to it, and without the constant need for explicit supervision. While the exact process through which this is achieved is largely unknown, Self-Supervised Learning (SSL) is a reasonable attempt at endowing machines with this property to enable them to learn from unlabeled data using proxy tasks. Once learned, these representations can be used to solve a wide variety of visual recognition tasks using light to moderate amounts of supervision. Due to imperfections in the SSL process however, it is sometimes unclear whether the representations learned are expressive enough for solving downstream tasks that may be as yet unknown. A question also arises as to whether these representations can facilitate the training of supervised models downstream, that can manage uncertainty well.
To address the first issue, we will in the first part of this thesis, formulate a general-purpose information-theoretic criterion that allows further improving already-trained, highly optimized SSL representations. We show that, by maximizing the entropy of SSL trained embeddings in a compact space, we prepare them as well as possible for future, as-yet-unknown, discrimination tasks. Moreover, since traditional entropy maximization relies on high dimensional estimates which are unreliable, our criterion is defined in terms of easy-to-estimate, low-dimensional constraints. Surprisingly, we find that---without explicitly enforcing higher-dimensional constraints in our criterion---we obtain embeddings that are practically useful, and result in significant improvements on downstream tasks with just a few additional epochs of training.
To address the second issue, in our current and upcoming work, we study the problem of machine learning with a reject option. In this setting, a machine learning model has the ability to abstain from making a prediction when it is likely to make a mistake, possibly due to insufficient information present in the input to draw a conclusion. In order to deal with prediction uncertainty, we leverage the entropy of the label distribution over an ensemble of classification models trained on top of the previously described maximum entropy representations. This allows us to maximize the diversity, and in turn reveal the discrepancies, of a prediction made given a difficult or ambiguous input. While this is useful in a wide range of settings, we apply it to the problem of time prediction from an analog clock, which while relatively simple for humans, befuddles even the best of the large Vision-Language Models.
Advisor:
Erik Learned-Miller