Faculty Recruiting Support CICS

Sociolinguistically Driven Approaches for Just Natural Language Processing

14 Feb
Friday, 02/14/2020 1:00pm to 4:00pm
CS 303
PhD Dissertation Proposal Defense
Speaker: Su Lin Blodgett


Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these technologies do not accrue evenly to all users; NLP systems can reproduce harmful stereotypes, perform poorly for speakers of "non-standard" language varieties, and prevent such speakers from participating fully in public discourse. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work in NLP, fairness and justice in machine learning, and sociolinguistics. In this thesis, we propose to address two questions at this intersection: i) How can we develop just NLP systems?, and ii) How can we develop computational approaches for examining language variation in large scale datasets? We will discuss how these questions are tightly connected, and propose the following contributions: i) a model and dataset for quantifying the performance of two types of NLP tools on African-American English (AAE)-like text, ii) an investigation of how current literature in NLP understands bias and a new set of proposed research directions, iii) a proposed taxonomy of harms arising from NLP systems, iv) an adaptation of the measurement modeling framework from the quantitative social sciences for effectively evaluating approaches for quantifying bias in NLP systems, and v) a proposed computational sociolinguistic study of regional variation in AAE morphosyntax.

Advisor: Brendan O'Connor