Content

Speaker:

Elita Lobo

Abstract:

The increasing use of AI in high-stakes domains, including healthcare, supply chain management, and finance, has created a pressing need for algorithms that are not only accurate, but also reliable, fair, and trustworthy. Yet, in real-world applications, AI systems often struggle when faced with changing environments and evolving data distributions. This gap between controlled experiments and real-world performance is driven by several challenges. Many AI models are trained and evaluated on carefully curated datasets that fail to capture the full complexity and variability of real-world data. Furthermore, for many critical tasks, relevant datasets are available only in limited quantities due to privacy concerns or proprietary restrictions that prevent widespread access. As a result, models trained on limited or non-representative data tend to overfit to the training distribution, making them ill-equipped to generalize or handle even small distribution shifts. Aggravating these issues is the prevailing practice of optimizing models primarily for accuracy and efficiency. Many commonly used machine learning methods lack explicit mechanisms to address uncertainty, noise in the data, or distributional shifts. This can leave AI systems brittle and vulnerable not only to small shifts in data, but also to adversarial attacks that exploit these weaknesses. Therefore, it is imperative to develop machine learning methods that go beyond efficiency and accuracy, and that are robust under uncertainty and data limitations.

This thesis addresses these challenges by designing robust algorithms that tackle different aspects of uncertainty across a range of applications, including reinforcement learning, resource allocation, explainability, and large language models (LLMs). We begin by addressing epistemic uncertainty in Offline Reinforcement Learning, where policies must be learned from limited datasets. To this end, we introduce a Bayesian framework that models uncertainty and optimizes policies to maximize returns at the worst-case alpha percentile, resulting in robust but less conservative policies that achieve higher performance.

Building on this, we extend our approach to the problem of fair resource allocation, focusing on market matching where a limited number of items must be assigned to agents under various constraints. When agent preferences are unknown and must be inferred from limited data, our algorithms compute robust allocations by explicitly optimizing conservative estimates of welfare objectives, ensuring fairness and resilience even when data is scarce. 

Next, we turn to the issue of explainability. Popular explanation methods like LIME and SHAP rely on local approximations of model behavior, making their explanations unstable and susceptible to adversarial manipulation. To address this, we propose a framework for generating model-precise explanations that remain consistent under attack, thereby enhancing both interpretability and trust in model outputs.

Finally, we investigate counterfactual robustness in LLM-based math verifiers. Because these verifiers are trained on data generated from the models they are designed to evaluate, they struggle to generalize to related tasks or shifted distributions. We introduce a counterfactual training strategy that improves robustness to such shifts, thereby increasing the reliability of LLM-based reasoning systems.

Taken together, these contributions advance the development of robust and reliable AI/ML systems that can operate effectively under uncertainty and minor distribution shifts. By tackling robustness challenges in sequential decision-making, fair allocation, explainability, and LLMs, this work supports the broader goal of deploying trustworthy AI in complex real-world environments.

Advisor:

Yair Zick