Content

Speaker:

Arisa Tajima

Abstract:

As machine learning is increasingly deployed in high-stakes applications, ensuring strong privacy guarantees while maintaining accuracy and efficiency remains a fundamental challenge. Privacy-enhancing technologies, such as differential privacy (DP) and secure computation, provide rigorous protections, but their direct application often incurs significant tradeoffs that limit practical adoption. This dissertation advances the design of practical privacy-preserving machine learning systems by co-designing algorithms, model architectures, and cryptographic protocols to better balance privacy, accuracy, and efficiency.
 
We begin by studying output privacy in machine learning. We introduce a DP mechanism for random forest classifiers based on a novel matrix representation that enables optimized noise allocation, significantly improving accuracy. We further examine the impact of DP mechanisms in real-world settings through a case study on Census data releases for redistricting, evaluating how privacy constraints affect downstream fairness validation.
 
Next, we address input privacy by developing a secure protocol for fine-tuning language models using secure multiparty computation (MPC). To overcome numerical instability in MPC-based training, we propose an adaptive loss scaling technique that enables stable and efficient training of transformer-based models under secure computation constraints.
 
Finally, we propose hybrid approaches that combine DP with cryptographic primitives to achieve both input and output privacy. Focusing on  structured workloads such as counting queries, we selectively replace stronger cryptographic components with DP-aware ones, substantially reducing computational overhead while preserving the desired privacy guarantees.

Advisor:

Amir Houmansadr