Faculty Recruiting Support CICS

Spotlight: Voices of Data Science 2021, "Data Science for the Common Good"

Photo illustration: VODS panelists

Last weekend, over 200 current and aspiring data scientists participated in Voices of Data Science, a student-run conference co-sponsored by CICS and the Center for Data Science. The conference aims to provide a platform to amplify the voices of data scientists from underrepresented communities; in this inaugural year, the focus was on the success of women (cis and trans) and non-binary data scientists. The event featured talks from data science professionals working in areas including healthcare, social science, and energy infrastructure.

At one panel, “Data Science for the Common Good,” a group of early-career faculty and researchers discussed their work towards using machine learning and data science to improve lives—applying themselves to challenges including decarbonizing the power grid, providing electrical access in low-access areas of the world, and reducing harm and bias in natural language processing (NLP).

A common theme that emerged was the need for AI researchers to match the use of their tools to the problems they are attempting to solve. One example provided by Priya Donti, doctoral student at Carnegie Mellon University, is that when working with the power grid, she thinks about how to “create layers within neural networks that actually represent power system physics, or other constraints that we have in the power system,” in order to avoid catastrophic issues like blackouts.

In another example, Su Lin Blodgett ’18MS, ’20PhD, now a postdoctoral researcher at Microsoft Research Montréal, discussed her student collaboration with Distinguished Professor Lisa Green of the Department of Linguistics, a preeminent expert in African-American English (AAE). In their 2016 paper, they demonstrated the ineffectiveness of existing NLP tools in parsing AAE-like text on Twitter, and developed new parsing models for AAE. “I think a lot of times we are tempted to think of unfairness as arising from a lack of data,” she explains, “But in this case, all the extra data in the world wouldn’t have helped if the grammatical structures [are] not legible to the formalism we use to parse them.”

There was broad agreement among the group of the need for data scientists to collaborate with experts outside computer science, bring in frameworks from other disciplines, and center those who will most be affected by their work. When working on projects for the common good, researchers and technologists should remember that “communities have lived expertise,” as Donti put it. “They understand the problems that they’re facing, the problems that we’re trying to solve … it is about building long-term relationships with communities and grassroots organizations and others who are thinking about these issues.”

The panel also featured Kasthuri Jayarajah, research fellow and adjunct faculty at Singapore Management University, Simone Nsutezo Fobi, doctoral student at Columbia University, and CICS Assistant Professor Laure Thompson.

Watch the full set of talks from the 2021 Voices of Data Science conference — the "Data Science for the Common Good" panel begins at 1:39:00 in the "Day 1 Part 1" video.