Embedding: Choose Right Relations to Embed

13 Sep
Thursday, 09/13/2018 12:00pm to 1:00pm
Computer Science Building, Room 150/151
Machine Learning and Friends Lunch

Word embeddings are a widely-used tool to analyze language. Exponential family embeddings generalize the technique to other types of data by modeling the conditional probability of a target observation (a word or an item) conditioned on the elements in the context (other words or items).

One challenge to fitting embedding methods is sparse data, such as a document/term matrix that contains many zeros. We develop zero-inflated embeddings to address this issue. In a zero-inflated embedding (ZIE), a zero in the data can come from an interaction to other data (i.e., an embedding) or from a separate process by which many observations are equal to zero (i.e. a probability mass at zero). Fitting a ZIE naturally down-weights the zeros and dampens their influence on the model.

Another challenge is that the appear-to-be context often contains unrelated items. The embedding model considering all context elements will encode noisy co-occurrences as item relations in the embedding. We improve the quality of the embedding representations by choosing a subset of context elements for the embedding model. We develop a probabilistic attention model and use amortized variational inference to automatically choose this subset.

Liping Liu holds the position of "The Schwartz Family Assistant Professor'' at Tufts University. His research interests include variational inference, generative models, and embedding models. Prior to joining Tufts, Liu worked as a postdoctoral associate at Columbia University. Advised by Prof. David Blei, he worked on probabilistic embedding models. He earned his doctorate degree at Oregon State University, where he studied probabilistic models and applied these techniques to ecology studies. He also has industry experiences at IBM T.J. Watson Research and Alibaba. He is a reviewer for main machine learning conferences and journals, such as ICML, NIPS, ICLR, AISTATS, JMLR, and TPAMI.