Faculty Recruiting Support CICS

Effective and Efficient Methods for Resource-constrained NLP

31 May
Tuesday, 05/31/2022 2:00pm to 4:00pm
PhD Dissertation Proposal Defense
Speaker: Tu Vu

Abstract: Recently, substantial progress has been made in the field of natural language processing (NLP), which is in large part due to the advent of large-scale pre-trained language models (i.e., deep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data). While a few of these models exhibit impressive few-shot learning ability (i.e., the ability to solve a novel downstream task from only a few examples), state-of-the-art results still require fine-tuning on thousands or tens of thousands of downstream examples. In contrast, humans can learn a new concept with very little supervision (e.g., a child can generalize the concept of "cars" from just a few demonstrations in a book). To close the gap with this hallmark of human intelligence, this dissertation studies methods that exploit large volumes of unlabeled data or beneficial relationships among tasks to improve learning in resource-constrained scenarios.

First, through a large-scale study on task transferability over 3,000 combinations of tasks and data regimes within and across broad classes of NLP problems, we demonstrate conditions under which tasks can benefit each other (of which some are unintuitive or even contrary to common wisdom) and then propose effective task embedding methods (which represent tasks as real-valued vectors) to predict the most transferable source tasks for a given novel target task (by measuring similarity between task vectors). Second, we introduce a self-training method that uses task augmentation, a novel data augmentation technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeled texts. We highlight important ingredients for successful self-training, including the use of a broad distribution of pseudo-labeled data. Our method substantially improves sample efficiency across many NLP benchmarks. Remarkably, on a binary sentiment classification task, with only 8 training examples per class, we achieved comparable results to standard fine-tuning with 67K training examples. Third, to get around the infeasibility of fine-tuning enormous language models in resource-constrained scenarios, we propose a parameter-efficient method that learns a soft prompt (a sequence of tunable tokens) on one or more source tasks, which is then used to initialize the prompt for a given target task to condition a frozen pre-trained model to perform the task. Our method (which uses a frozen model) either matches or outperforms standard model tuning (a.k.a. fine-tuning, which fine-tunes all model parameters) across model sizes. Finally, we discuss several avenues for future research, including our recent effort to test if our proposed approaches could extend successfully to a multilingual setting (i.e., cross-lingual transfer learning). We highlight the importance of using the right evaluation metric and choosing the right adaptation method when applying multilingual pre-trained language models in a zero-shot language transfer setting. Taken as a whole, we hope this dissertation will spur more research into effective and efficient methods for resource-constrained NLP.

Advisor: Mohit Iyyer

Join via Zoom