Faculty Recruiting Support CICS

Improving Evaluation Methods for Causal Modeling Algorithms

10 Oct
Thursday, 10/10/2019 2:00pm to 4:00pm
CS 151
Ph.D. Dissertation Proposal Defense
Speaker: Amanda Gentzel

Causal modeling is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of machine learning researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances.  However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from the experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice.

We argue for expanding the standard techniques for evaluating algorithms that construct causal models.  Specifically, we argue for the addition of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data.  We survey the current practice in evaluation and show that the evaluation techniques we advocate are rarely used in practice, and we empirically demonstrate that these techniques produce substantially different results than using structural measures and synthetic data.  In order to improve evaluation practice in the community, we propose to provide a protocol for generating observational-style datasets from experimental data.  This will allow us to create a large number of datasets suitable for evaluation of causal modeling algorithms, and we will make these datasets, and algorithms to create more, publicly available.  Using these datasets, we will perform an extensive evaluation of current causal modeling algorithms.  We will identify key performance claims of these algorithms from the literature and re-evaluate them, using a significantly larger collection of datasets than was previously available.

Advisor: David Jensen