Towards Effective Modeling of Long-range Context

17 Jan

Wednesday, 01/17/2024 2:00pm to 4:00pm

Hybrid CS203 & Zoom

PhD Thesis Defense

Recent developments such as efficient self-attention and flash attention have enabled language models to process long input sequences with limited memory. While numerous efficient methods for scaling context size have been proposed, challenges persist in handling long-context tasks such as book-level summarization and long-document question answering.

In the first part of this talk, orthogonal to the efforts in scaling context size, I will first provide a systematic analysis of how language models utilize long-range context. The analysis is divided into token-level and segment-level evaluations. Our results demonstrate that segment-level tasks are more suitable for evaluating long-context modeling compared to perplexity, which is the most commonly used intrinsic metric for language model evaluation. Built on this analysis, I will introduce a new segment-level task -- in-book suffix identification, which requires the model to identify the correct suffix given a long prefix, among incorrect suffixes sampled from the same book. Our results show that, as of late 2023, both proprietary and large open-source language models still struggle to achieve good suffix identification accuracy on our challenge dataset ChapterBreak. Moreover, these models underperform our small segment-level language model SuffixLM. Having established the benefits of modeling context at the segment level, in the next part of the talk, I will introduce a new training method SuffixRL, which enables training token-level language model with segment-level signals. To effectively apply SuffixRL to LLaMA-7B models with limited resources, we apply low-rank adaptation and alternative regularizers, and demonstrate improved coherence under open-ended generation settings.

Advisor: Mohit Iyyer

Join via Zoom

Towards Effective Modeling of Long-range Context

Subscribe to the CICS eNewsletter