PhD Dissertation Proposal Defense: Cecilia Ferrando, Differentially Private Statistical Learning: Uncertainty Estimation and Utility Preservation
Content
Speaker
Abstract
Organizations increasingly rely on sensitive data for high-impact decisions, making it essential to develop methods that ensure individual privacy while preserving the utility of downstream analyses. Differential privacy (DP) has emerged as the gold standard for such guarantees. This thesis contributes novel methods for differentially private statistical learning, with a focus on improving the usability of private inference through rigorous uncertainty quantification and the accuracy of private estimators via data-dependent mechanisms. First, we develop parametric bootstrap methods for DP confidence intervals, enabling uncertainty estimation through post-processing while accounting for both sampling and privacy noise. This approach avoids multiple accesses to the data and remains accurate even in small-sample regimes. Second, we propose data-dependent sufficient statistic perturbation, a framework that applies private query-answering algorithms to release sufficient statistics, improving estimation in both linear and logistic regression. Third, we introduce a data-dependent method for private multivariate Gaussian covariance estimation: instead of perturbing the entire matrix, we iteratively select the most informative covariance entries to measure under our privacy budget, then reconstruct the full positive semidefinite matrix via a maximum-entropy completion. By concentrating noise where it matters most, we expect this strategy to yield high-utility covariance estimates and pave the way for novel continuous synthetic data generation methods. These contributions aim to bridge theory and practice, advancing the applicability of differential privacy in statistical machine
learning.
Advisor
Dan Sheldon