Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities

03 Apr
04/03/2009 -
6:30am to 8:30am
Ph.D. Seminar

Xuerui Wang

Computer Science Building, Room 150

The abundance of textual data in the information age poses an immense challenge for us: how to perform large-scale inference to understand and utilize this overwhelming amount of information. We develop effective and efficient statistical topic models for massive text collections by taking care of extra information from other modalities in addition to the text itself. Text documents are not just text, for example, research papers have author information, email messages contain social sender-recipient links, legislative resolutions are recorded with votes, and so on. These kinds of additional information are naturally interleaved with text. Most of the previous work, however, pay attention to only one modality at a time, and ignore the others. In this talk, I will present a series of probabilistic topic models to show how we can bridge multiple modalities of information, in a united fashion, for various tasks. Interestingly, joint inference over multiple modalities leads to many findings that can not be discovered from just one modality alone, which are clear evidence that we can better understand and utilize massive text collections when additional modalities are considered and modeled jointly with text.

Advisor: Andrew McCallum