Faculty Recruiting Support CICS

Doctoral Graduate Emma Tosch’s Work Highlighted by ACM SIGPLAN

Emma Tosch
Emma Tosch

A paper by UMass Amherst College of Information and Computer Sciences (CICS) recent doctoral graduate Emma Tosch, co-authored by Eytan Bakshy of Facebook and CICS Professors Emery Berger, David Jensen, and Eliot Moss, was recently selected by the ACM Special Interest Group on Programming Languages (SIGPLAN) to be featured as a "Research Highlight." SIGPLAN has provided highlights for only three such papers this year, selected from over 200 papers published at SIGPLAN-sponsored conferences based on their quality and broad appeal. 

The research team's paper, "PlanAlyzer: Assessing Threats to the Validity of Online Experiments," introduces a novel approach to statistically checking common errors in online field experiments (such as A/B tests for content or user experience on websites). The current method used to verify whether the framework's results are trustworthy, known as "internal validity," requires manual review by someone with expertise in experimental design. 

In contrast, Tosch's PlanAlyzer offers an automatic approach that works with PlanOut, the widely-used framework and programming language for online field experiments, originally developed by Eytan Bakshy and colleagues at Facebook and Stanford University. PlanAlyzer first checks PlanOut programs for a variety of threats to the internal validity of an experiment, such as rendering bugs in browsers or other factors outside the designer's control, failing to record values required for analysis, or failing to properly link an analysis to the original design. If a program passes these tests, PlanAlyzer will generate contrasts--pairs of tested variables that will be compared to each other--for the designer.

The researchers view this work as a first, foundational step toward automating the validation of experiments. They tested their new tool on sets of hundreds of previously-deployed online experiment scripts provided to the team by Facebook, "mutating" some of them to introduce errors. When analyzing mutated scripts, it generated only two false results out of fifty. When analyzing valid (non-mutated) scripts, it automatically generated 82% of the contrasts hand-specified by the experiment designers. 

Read the paper or watch the presentation by Tosch from the SPLASH 2019 conference.