Sociolinguistically Driven Approaches for Just Natural Language Processing

08 Sep

Tuesday, 09/08/2020 11:00am to 1:00pm

Zoom Meeting

PhD Thesis Defense

To join the Zoom call: please contact Su Lin Blodgett (blodgett@cs.umass.edu) for details. Please include your affiliation, who you are, and reason for joining if interested."

Abstract

Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed these technologies can be harmful; NLP systems reproduce stereotypes, prevent speakers of ``non-standard" language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this dissertation, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms?

We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African-American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work in the field.

Committee: Brendan O'Connor (chair), Mohit Iyyer, Hanna Wallach, and Lisa Green (linguistics)

Sociolinguistically Driven Approaches for Just Natural Language Processing

Subscribe to the CICS eNewsletter