Faculty Recruiting Make a Gift

Machine Learning Models for Efficient and Robust Natural Language Processing

24 Jan
Add to Calendar
Thursday, 01/24/2019 9:00am to 12:00pm
CS303
Ph.D. Dissertation Proposal Defense
Speaker: Emma Strubell

Abstract:

Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing "who" did "what" to "whom", has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, hoards of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency. In this proposal I will present the methods I have developed so far to facilitate fast and robust inference across many of the most common building blocks for NLP, and propose the work that remains to complete a suite of scalable NLP tools with robust performance across text domains and languages.

First, I will describe Dynamic Feature Selection, paired learning and inference algorithms that accelerate inference in linear classifiers, the heart of the fastest NLP models, by 5-10x. I will then present Iterated Dilated Convolutional Neural Networks} (ID-CNNs), a distinct combination of network structure, parameter sharing and training procedures that increase inference speed by 14-20x with accuracy matching Bi-LSTMs, the most accurate models for NLP sequence labeling. Finally, I describe Linguistically-Informed Self-Attention (LISA), a neural network model that combines multi-head self-attention with multi-task learning to facilitate improved generalization to new domains. We show that incorporating linguistic structure through the attention mechanism leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain.

I will conclude by identifying a number of shortcomings in my completed work, and propose models and experiments to address those shortcomings: (1) completing careful speed experiments characterizing the run-time performance gains from multi-task modeling over a pipeline of single-task models; (2) extending LISA from sentence- to document-level reasoning by incorporating the tasks of mention-finding and within-document co-reference resolution; and (3) enabling generalization of SRL to new languages without labeled SRL data by designing adversarial training objectives for LISA, where model parameters are explicitly optimized for cross-lingual generalization.

Advisor: Andrew McCallum