Faculty Recruiting Support CICS

Improving the Coherence and Factuality of Text Generation with Retrieval-Augmented Language Models

16 Jan
Tuesday, 01/16/2024 2:00pm to 4:30pm
Zoom
PhD Dissertation Proposal Defense
Speaker: Shufan Wang

Large language models have made significant progress in natural language processing. However, they face limitations such as the inability to generate factually accurate text and to maintain coherence in the generated output. In this dissertation, I present a series of studies to evaluate and improve language models through using retrieval-augmented language models.

1) I first present a case study where retrieval is used in long-form question answering (LFQA) to produce more coherent content and also provide more principled evaluation frameworks. In LFQA, "exemplification" (using examples to clarify concepts) is an important linguistic phenomenon. We treat the problem as a retrieval one instead of a generative one and show that our Example Retriever (EgRET) outperforms existing state-of-the-art generative models, by producing relevant examples through retrieval. The retrieval approach also creates an evaluation metric that aligns more closely with human
judgments.

2) Next, effective text representation is crucial for retrieval quality. However, I find that despite the success of pre-trained encoders like BERT, they struggle to produce meaningful text embeddings for diverse linguistic elements such as phrases. To address this challenge, I modify BERT through contrastive learning to build better phrase representations (Phrase-BERT). Experiments show that phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model that interprets topics as mixtures of words and phrases.

3) Additionally, not all applications of retrieval-augmented language models result in immediate success. I focus on the
interpolation-styled retrieval-augmented language models (kNN-LM and TRIME) and find that despite reducing perplexities, they do not show corresponding improvements in open-ended text generation quality due to issues such as limited improvements to only a subset of tokens. This finding suggests potential future directions for using retrieval during text generation, such as designing more selective retriever mechanisms.

4) Finally I propose methods to improve language models' capability to generate more factually accurate text, and also to build more robust tools for evaluating the factual accuracy of the generated text from the language models. To improve the factual accuracy of the language models, I will experiment various prompting strategies and also leverage recent advancements in reinforcement learning from human feedback (RLHF). In terms of factuality evaluation, I expand upon
existing tools such as FactScore, to incorporate the "recall perspective" (coverage of important and factual information), which is overlooked in current factuality evaluation methods.

Advisor: Mohit Iyyer

Join via Zoom