Learning with Aggregate Data

23 Feb
Friday, 02/23/2018 9:00am to 11:00am
Lederle Graduate Research Center, Room A310
Ph.D. Dissertation Proposal Defense
Speaker: Tao Sun

"Learning with Aggregate Data"

Various real-world applications involve directly dealing with aggregate data. In this work, we study Learning with Aggregate Data from several perspectives and try to address their combinatorial challenges.

At first, we study the problem of learning in Collective Graphical Models (CGMs), where only noisy aggregate observations are available. We show the exact inference is NP-hard and propose an approximate inference for CGMs. By solving the inference problems, we are empowered to build large-scale bird migration models, and models for human mobility under the differential privacy setting.

Secondly, we consider problems given bags of instances and bag-level aggregate supervisions. Specifically, we study the US presidential election and try to build a model to understand the voting preferences of either individuals or some demographic groups. The data consists of characteristic individuals from the census data within each voting precinct, and aggregate supervisions of voting totals per precinct. We consider Learning with Label Proportions (LLPs) to build an instance- level model. We propose a fully probabilistic LLP with efficient exact inference.

Thirdly, we study Distribution Regression (DR), which has similarly setting to LLPs but builds bag-level models. Existing models build upon kernel mean embedding and have strong kernel assumptions, we relax them by proposing a simple, deep neural network based model that could process data with large bag size and large feature dimensions.

Advisor: Daniel Sheldon