PhD Thesis Defense: Arisa Tajima, Advancing End-to-End Privacy In Machine Learning: Input, Output, and Beyond
Content
Speaker:
Abstract:
As machine learning is increasingly deployed in high-stakes applications, ensuring strong privacy guarantees while maintaining accuracy and efficiency remains a fundamental challenge. Privacy-enhancing technologies, such as differential privacy (DP) and secure computation, provide rigorous protections, but their direct application often incurs significant tradeoffs that limit practical adoption. This dissertation advances the design of practical privacy-preserving machine learning systems by co-designing algorithms, model architectures, and cryptographic protocols to better balance privacy, accuracy, and efficiency.
We begin by studying output privacy in machine learning. We introduce a DP mechanism for random forest classifiers based on a novel matrix representation that enables optimized noise allocation, significantly improving accuracy. We further examine the impact of DP mechanisms in real-world settings through a case study on Census data releases for redistricting, evaluating how privacy constraints affect downstream fairness validation.
Next, we address input privacy by developing a secure protocol for fine-tuning language models using secure multiparty computation (MPC). To overcome numerical instability in MPC-based training, we propose an adaptive loss scaling technique that enables stable and efficient training of transformer-based models under secure computation constraints.
Finally, we propose hybrid approaches that combine DP with cryptographic primitives to achieve both input and output privacy. Focusing on structured workloads such as counting queries, we selectively replace stronger cryptographic components with DP-aware ones, substantially reducing computational overhead while preserving the desired privacy guarantees.
Advisor:
Amir Houmansadr