Faculty Recruiting Support CICS

Few-Shot Natural Language Processing by Meta-Learning Without Labeled Data

26 Mar
Friday, 03/26/2021 10:00am to 12:00pm
Zoom Meeting
PhD Dissertation Proposal Defense
Speaker: Trapit Bansal

Zoom Meeting: https://umass-amherst.zoom.us/j/93891903416?pwd=SlQvaE9WNzM4WktzZ2NIQ1NQcE5Idz09

Humans show a remarkable capability to accurately solve a wide range of problems efficiently - utilizing a limited amount of computation and experience. Deep learning models, by stark contrast, can be trained to be highly accurate on a narrow task while being highly inefficient in terms of the amount of compute and data required to reach that accuracy. Within natural language processing (NLP), recent breakthroughs in unsupervised pretraining have enabled reusable models that can be applied to many NLP tasks, however, learning of new tasks is still inefficient. This has led to research on few-shot learning, where the goal is to generalize to new tasks with very few labeled instances. Meta-learning, or learning to learn, treats the learning process itself as a learning problem from data with the goal of learning systems that can generalize to new tasks efficiently. This has the potential to produce few-shot learners that can accurately solve a wide range of new tasks. However, meta-learning requires a distribution over tasks with relevant labeled data that can be difficult to obtain, severely limiting the practical utility of meta-learning methods. In this dissertation, we develop methods to enable large-scale meta-learning from unlabeled text data and improve the few-shot generalization ability of NLP models.

We contribute methods that propose meta-training tasks synthetically created from unlabeled text, allowing for a large task distribution for meta-learning. This enables representation learning, rapid learning of new tasks by meta-learning from millions of self-supervised tasks, can be combined with supervised tasks to regularize supervised meta-training, and leads to accurate few-shot learning on a diverse set of NLP classification tasks. Since real world applications of NLP require learning diverse tasks with different number of classes, we first introduce an optimization-based meta-learning method that can learn from multiple NLP classification tasks with any number of classes. We then leverage the proposed self-supervised approach to create meta-training tasks, with diverse number of classes, and meta-train to directly optimize for future fine-tuning with few examples. This enables learning model initialization as well as key hyper-parameters, like learning rates, and leads to efficient few-shot learning of new tasks. We further explore modifications that control the difficulty and diversity of tasks, and contrast them in terms of their suitability for few-shot learning. Our findings yield accurate and efficient meta-learning methods that improve few-shot generalization to diverse tasks and enable future applications to many other meta problems in NLP.

Advisor: Andrew McCallum