Faculty Recruiting Make a Gift

Helena: A Web Automation Language for End Users

03 Oct
Wednesday, 10/03/2018 4:00pm to 5:00pm
Computer Science Building, Room 150/151
Rising Stars
Speaker: Sarah Chasins

Abstract:  Web data is revolutionizing the social sciences. Researchers envision a diverse range of studies facilitated by the unique properties of web data -- its scale, ecological validity, timeliness.  With the wide variety of web scripting libraries on offer, programmers have access to increasing language support for collecting web data; however, these libraries are inaccessible to non-programmers, and empowering non-programmers to collect these datasets is a long-standing open problem.  To democratize access to web data, we designed the Helena web automation language.  Helena brings together the following key innovations, which together empower end users to write robust web scraping programs:  (i) The Helena programming environment uses Programming by Demonstration (PBD), which makes scripts easy to write; the tool takes a single-shot learning approach, creating scripts based on recording a single interaction of the user with a set of webpages. Empirically, users can learn the tool and use it to write a robust large-scale scraping script in under 10 minutes, while programmers tackling the same task with the traditional Selenium language time out after an hour.  (ii) Helena's adaptive replayer makes scripts robust to webpage redesigns and obfuscation, which enables longitudinal experiments.  (iii) Helena's novel runtime can parallelize and distribute scraping programs for speedups over 50x, facilitating large-scale scraping.  Our approach relied on novel insights into the web scraping domain but also on bringing new techniques to bear. By combining techniques from the Programming Languages community and the Human Computer Interaction community, we arrived at a language design that meets real users' needs.

Bio: Sarah Chasins is a PhD candidate at UC Berkeley, advised by Ras Bodik. Her research interests lie at the intersection of programming languages and HCI.  She works on end-user programming, program synthesis, and programming language design.  Much of her work is shaped by ongoing collaborations with social scientists, from fields ranging from Sociology to Economics to Public Policy. She believes well-designed languages and programming environments can put complicated programming tasks in range for people who consider themselves non-coders.  She has been awarded an NSF graduate research fellowship and a first place award in the ACM Student Research Competition.

A reception for attendees will be held at 3:30 p.m. in CS 150

Faculty Host
: