Faculty Recruiting Support CICS

Improving Data Science Pipelines via Interactive Visual Analytics

10 Jul
Tuesday, 07/10/2018 11:00am
Computer Science Building, Room 150
Special Event

Abstract:  The ever-increasing availability of digitized data, along with the commoditization of machine learning (ML), has led to skyrocketing demand for data-driven solutions in a broad range of real world domains. However, usability of the end-to-end data science pipeline, from data discovery to exploratory data analysis to model explanation and performance analysis, is in dire need of improvement. I will describe some recent work that addresses this problem by integrating scalable guided visual analysis as a first-class citizen.
In the first part of my talk, I will introduce a visual interaction framework that lets users change the input and output of black-box ML models and immediately observe the effects. Our framework improves model utilization and explainability through the what-if analysis enabled by this bidirectional coupling. I will demonstrate the efficacy of this approach for ML-based dimensionality reduction algorithms, introducing forward and backward projection interactions along with novel visualizations that facilitate the interactions. I will give exploration examples from dimensionality reductions using PCA, autoencoder, and t-SNE. I will also discuss connections of our work to influence functions, saliency maps, occlusion experiments, and other black-box inspection and visualization techniques.
Standard performance metrics (e.g., error rate, precision, recall, etc.) for ML models are useful for reporting results but inadequate when it comes to revising and debugging the models.  In the second part of my talk, I'll present Track Xplorer, an interactive visualization system to query, analyze, and compare the predictions of sensor-data classifiers. Through coordinated track visualizations that represent temporally-aligned predictions, Track Xplorer enables users to interactively explore and compare the results of different classifiers, assessing their accuracy with respect to the ground truth labels and video. Track Xplorer contributes an extensible visual algebra over track representations to filter, compose, and compare classification outputs, enabling users to reason effectively about classifier performance. Through integration with a version control system, Track Xplorer also supports tracking of models and their parameters without additional workload on model developers.
I'll conclude by giving a progress report on my continuing efforts around scalable guided exploratory analysis and visualization design and discussing avenues of future research.
Bio:  Cagatay Demiralp is a visual analytics and data visualization researcher.  His current work focuses around two themes: 1) Automating visual data exploration for scalable guided data analysis and 2) developing interactive tools that facilitate iterative visual data and model experimentation. Cagatay is currently visiting the database group at Columbia University. Until recently, he was with IBM Research. Between 2012-2014, he was a postdoc at Stanford University and member of the Interactive Data Lab at the University of Washington. He obtained his PhD from Brown University.  Cagatay also co-founded Fitnescity, the precision wellness startup based in NYC.

Faculty Host