Faculty Recruiting Support CICS

Understanding the Role of Context in Neural Language Models

10 May
Tuesday, 05/10/2022 12:00pm to 2:00pm
PhD Dissertation Proposal Defense
Speaker: Simeng Sun


Neural language models (NLMs), which probabilistically predict the identity of the word given the immediately preceding context, form the core of modern natural language processing (NLP).  Despite the success of adapting these models to many downstream NLP tasks, they are still limited in how well they exploit the context to predict the target word. For instance, due to the Transformer LM's quadratic dependence on sequence length, long documents are typically chunked into non-overlapping small pieces, which precludes learning discourse-level long-range dependencies. Moreover, the next-token prediction objective is at odds with learning dependencies spanning long sequences, which casts doubts on an NLMs' ability to generate globally coherent output.

In this thesis, we aim to (1) understand the role of context in neural autoregressive language models and (2) design NLMs that efficiently and effectively model both short- and long-range context. We begin with an analysis of long-range language models (LRLMs) on long-sequence benchmark PG-19. Our analysis reveals the alleged gains in perplexity of LRLMs originate primarily from better modeling of local context and that LRLMs are insensitive to a diverse array of perturbations applied to distant tokens. We also observe that these models struggle with sequence-level tasks, which require deep understanding of long-range context. We thus propose a sequence-level challenge benchmark ChapterBreak for testing language modeling capability at discourse boundaries, which all recent LRLMs struggle with.  Next, we move on to designing better NLMs and begin with two naive approaches (1D convolution and fixed attention) to modeling context. Both approaches demonstrate improvements in training efficiency, though the former at the cost of decreased performance from standard Transformer LMs.  As an ongoing work, we are improving short-range language models by augmenting their input with compressed latent context vectors, in the hope to bypass the long sequence bottleneck in Transformer LMs and get context information beyond the truncated short-range context window. Finally, we propose to evaluate the global coherence of long-form output via sequence-level language model, such as SuffixLM. As a stretch goal, we intend to explore multiple document-level downstream tasks (e.g., document-level machine translation, long-form QA, event extraction, etc.), and evaluate the efficacy of incorporating long-range context to those tasks. We hope that this thesis proposal would fill the gap between current token-level centric LM research and research into discourse-level models.

Advisor: Mohit Iyyer

Join via Zoom