PhD Thesis Defense: Cecilia Ferrando, Differentially Private Statistical Learning: Uncertainty Estimation and Utility Preservation
Content
Speaker:
Abstract:
Organizations increasingly rely on sensitive data for high-impact decisions, making it essential to develop methods that ensure individual privacy while preserving the utility of downstream analyses. Differential privacy (DP) has emerged as the gold standard for such guarantees. This thesis contributes novel methods for differentially private statistical learning, with a focus on improving the usability of private inference through rigorous uncertainty quantification and the accuracy of private estimators via data-dependent mechanisms. First, we develop parametric bootstrap methods for DP confidence intervals, enabling uncertainty estimation through post-processing while accounting for both sampling and privacy noise. This approach avoids multiple accesses to the data and remains accurate even in small-sample regimes. Second, we propose data-dependent sufficient statistic perturbation, a framework that applies private query-answering algorithms to release sufficient statistics, improving estimation in both linear and logistic regression. Third, we introduce a data-dependent method for private multivariate Gaussian covariance estimation: instead of perturbing the entire matrix, we iteratively select the most informative covariance entries to measure under the privacy budget, then reconstruct the full positive semidefinite matrix via a maximum-entropy completion. By concentrating noise where it matters most, we show that this strategy yields high-utility covariance estimates, paving the way for novel continuous synthetic data generation methods. These contributions aim to bridge theory and practice, advancing the applicability of differential privacy in statistical machine learning.
Advisor:
Dan Sheldon