Modeling the Multi-mode Distribution in Self-Supervised Language Models

31 Jan

Monday, 01/31/2022 5:00pm to 7:00pm

PhD Dissertation Proposal Defense

Abstract: Self-supervised large language models (LMs) have become a highly-influential and foundational tool for many NLP models. For this reason, their expressivity is an important topic of study. In near-universal practice, given the language context, the model predicts a word from the vocabulary using a single embedded vector representation of both context and dictionary entries. Note that the context sometimes implies that the distribution over predicted words should be multi-modal in embedded space. However, the context's single-vector representation provably fails to capture such a distribution. To address this limitation, we propose to represent context with multiple vector embeddings, which we term facets. This is distinct from previous work on multi-sense vocabulary embeddings, which employs multiple vectors for the dictionary entries, not the context.

In this dissertation, we first present the theoretical limitations of the single context embedding in LMs and how the theoretical analyses suggest new alternative softmax layers that encode a context as multiple embeddings. The proposed alternative achieves better perplexity than the mixture of softmax (MoS) without adding significant computational cost to LMs. In addition to predicting the next/masked word, we also use multiple CLS embeddings to improve state-of-the-art pretraining methods for BERT on natural language understanding (NLU) benchmarks, especially when the datasets are small, without introducing significant extra parameters or computations. Furthermore, we show that our multi-facet contexts improve the measurement of sentence similarity, the extraction of important words in the sentence or document, distantly supervised relation extraction, unsupervised text pattern entailment detection, and cold-start citation recommendation. Finally, we use the multiple vector embeddings to predict the future topics of a context, and build on the basis, we propose a novel interactive language generation framework.

Advisor: Andrew McCallum

JOIN VIA ZOOM

Modeling the Multi-mode Distribution in Self-Supervised Language Models

Subscribe to the CICS eNewsletter