Faculty Recruiting Support CICS

Measurement and Causal Inference for Social Data Science with Text

21 Jan
Thursday, 01/21/2021 11:00am to 1:00pm
Zoom Meeting
PhD Dissertation Proposal Defense
Speaker: Katie Keith

Zoom Meeting:  https://https://umass-amherst.zoom.us/j/93555507517


This thesis is in the domain of social data science, examining human behavior through quantitative analysis of large-scale data, and focuses on methods for text data because language is one of the richest and most salient expressions of human thought and behavior. The methods presented build upon two central themes for text data: (a) measurement, quantifiable summaries of empirical phenomena, and (b) causal inference, estimating cause-and-effect relationships. Regarding measurement, we examine cross-document entity-event measurement, and present an empirical pipeline that identifies the names of civilians killed by police from a corpus of news documents. Also regarding measurement, we examine prevalence estimation, the task of inferring the relative frequency of classes of unlabeled examples in a group, present a generative probabilistic modeling approach to prevalence estimation for documents, and construct and evaluate prevalence confidence intervals. Regarding causal inference, we gather and categorize applications that use text to remove confounding from causal estimates and provide a guide to data processing and evaluation decisions in this space. Proposed work is to develop methods that improve the use of human judgements to validate the adjustments of causal estimates with text.

Committee Members:

Brendan O'Connor (Chair)
David Jensen
Mohit Iyyer
Douglas Rice (Outside Member, Political Science)