PhD Dissertation Proposal Defense: Nicholas Perello, Towards Fair and Explainable Artificial Intelligence
Content
Speaker
Abstract
With the rapid widespread adoption of automated systems for decision-making, lawmakers and end-users have called for assurances on the trustworthiness and safety of these systems. That is, these systems should make decisions based on features relevant to the domain, be non-discriminatory, and inform end-users on how to improve their outcomes. To this end, researchers have proposed explainability techniques, discrimination measures, and fair learning methods for machine learning. This thesis intersects these topics to introduce methods that (i) prevent discrimination in supervised learning in an explainable way and (ii) provide fair explanations and algorithmic recourse.
In this thesis, we first focus on discrimination in supervised learning. We model discrimination as a dataset shift that poisons the training data with a discriminatory impact from a protected feature, such as race or gender. Then, we define a fair learning method, that may train on discriminatory datasets, as one that produces a model that performs well on fair test datasets. Our learning algorithm averages probabilistic interventions on the protected feature and optimizes loss under discriminatory dataset shifts. Expanding upon this work, we model discrimination using explainability and causal influence measures, and introduce a method that removes this discrimination. The method minimizes the influence of the protected features on decisions while preserving the influence of proxy features.
Next, this thesis focuses on explainability techniques. First, we introduce a data-driven and step-based approach for counterfactual explanations, also known as algorithmic recourse. This approach is highly customizable with respect to the constraints of the dataset, e.g., features that change unidirectionally. We present a comprehensive empirical evaluation showing that our method performs well across various recourse metrics, despite not optimizing for them. Then, in response to observations of disparate recourse
performance across protected groups, we introduce discrimination measures for algorithmic recourse. We extend prior work on welfare-based group-fair supervised learning to measure group welfare under multiple recourse objectives. Then, we propose a recourse method that maximizes group welfare and empirically show its robust performance over multiple recourse objectives.
For the next stage of this thesis, we propose novel explanation techniques for more complex settings, such as reinforcement learning (RL) and large language models (LLMs). For the former, our goal is to explain the behavior of autonomous agents in a human-interpretable manner such that users can verify whether agents align with their goals. Towards LLMs, we propose a multi-path step-based recourse approach motivated by our prior step-based approach. We use this recourse technique on LLM-based decision-making to
evaluate the impact that phrases associated with non-protected and protected features have on decisions.
Advisors
Przemyslaw Grabowicz and Yair Zick