Deep Energy-Based Models for Structured Prediction

26 May
Add to Calendar
Friday, 05/26/2017 9:30am to 11:30am
Computer Science Building, Room 151
Ph.D. Thesis Defense
Speaker: David Belanger

"Deep Energy-Based Models for Structured Prediction"

We introduce structured prediction energy networks (SPENs), a flexible framework for structured prediction. A deep architecture is used to define an energy function over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discriminative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design prediction and learning methods for each model. This is because all of our prediction and learning methods interact with the energy only via the standard interface for deep networks: forward and back-propagation. In a variety of applications, we find that we can obtain better accuracy using approximate minimization of non-convex deep energy functions than baseline models that employ simple energy functions for which exact minimization is tractable. 

This thesis contributes methods for improving the speed, flexibility, and accuracy of SPENs. These include convex relaxations for discrete labeling problems, end-to-end training, where we backpropagate through the process of doing gradient-based prediction, sampling-based training, which helps explore output space, methods for regularizing SPENs such that gradient-based prediction converges quickly, and hybrid models that combine conditional random fields and SPENs.

Advisor: Andrew McCallum