Data Science Tea

19 Apr
Tuesday, 04/19/2016 4:00pm to 5:30pm
Computer Science Building, Room 150/151
Special Event

Please join us for tea and refreshments (fruit, pastries, cookies and other light snacks) and three presentations by graduate students in Electrical and Computer Engineering and Computer Science.

Abhishek Dwaraki (Electrical and Computer Engineering)
Title: Towards Self-Healing Networks: How Data Science can help Software-Defined Networking

Abstract: Software-Defined Networking or SDN in short, has been revolutionizing the networking industry by equipping it with much needed flexibility and programmability. How far can SDN drive today's networks? As we move towards more powerful data science techniques and methodologies, can these methods be utilized in tandem with programmable networks to create autonomous entities that can fix themselves in the face of network faults?

Rick Freedman (Computer Science)
Title: Using Metadata to Automate Interpretations of Topic Models

Abstract: As massive datasets of information become more readily available, there is also a difficulty in properly annotating some forms of data, including large collections of documents and sensor readings. This is one benefit of unsupervised machine learning algorithms such as topic modeling that can cluster data without annotation given some sets of parameters. However, the resulting clusters are not always intuitive to a human due to the formulaic learning procedures as well as the interpretability of the inputs. We introduce the use of feature vectors for each input as metadata in order to autonomously derive descriptions of learned clusters. As an example, we present our current results of applying this method to activity recognition using topic models for the Cornell Activity Dataset (CAD) 120.

Daniel Barowy  (Computer Science)
Title: WoCMan: Harnessing the Wisdom of the Crowd for High-Quality Estimates

Abstract: Estimation is common to many computational problems. "Where are the person's eyes in this photo?", "At what time in this audio recording does the interviewee accidentally swear?", and "How many calories are in the food shown in this image?" are questions where the answer is an estimate of an unknown real value. While machine learning is capable of answering some of these questions, building such systems requires extensive domain knowledge in ML.  Furthermore, ML systems introduce a chicken-and-egg problem: training data must be gathered by other means.  Surprisingly, non-expert groups of people are capable of producing accurate estimates. This phenomenon, known as the wisdom of the crowd, is a promising way to make estimation available to both ordinary programmer and ML software designers who need to bootstrap their systems. We introduce WOCMAN, a domain-specific language (DSL) designed to make it easy for programmers to obtain high-quality estimates from the crowd. WOCMAN obtains interval estimates over arbitrary user-defined functions of crowd responses. Programmers declare their desired precision and budget, and WOCMAN iteratively increases the sample size until either the estimate is sufficiently refined or the budget is exhausted. We demonstrate with a face-labeling app, a "Where's Waldo?" app, and a "calorie counting camera" app.

Speaker Bios:

Abhishek Dwaraki is a fourth year Ph.D student in the Department of Computer Engineering working with Prof. Tilman Wolf. His research currently explores various facets of software-defined networking and the feasibility of using information versioning to aid in a variety of network management scenarios. His interests also include large-scale distributed systems.

Rick Freedman is a Ph.D. Candidate in the College of Information and Computer Sciences at the University of Massachusetts Amherst and was a JSPS Summer Fellow at the University of Tokyo for the summer of 2015.  His research interests lie at the intersection of artificial intelligence planning; plan, activity, and intent recognition; human-computer/robot interaction; topic modeling; knowledge representation; and statistical-relational methods.  He uses interdisciplinary approaches to develop systems that adaptively interact with human users through understanding their actions in the environment.

Daniel Barowy is a PhD candidate in the PLASMA lab at the College of Information and Computer Sciences, University of Massachusetts Amherst, advised by Professor Emery Berger.  Daniel's work focuses on enforcing constraints on data quality at the programming language level.  Prior work includes CheckCell, an input debugger for Excel (OOPSLA '14), FlashRelate, an automated data-wrangling tool (PLDI '15; Distinguished Artifact Award winner), and AutoMan, a crowdprogramming DSL for Scala (OOPSLA '12; forthcoming CACM Research Highlight).

In case of questions, contact: Nicholas Monath