PhD Thesis Defense: Brett Mullins, Practical Algorithms for Differentially Private Marginal Query Answering
Content
Speaker:
Abstract:
Differential privacy is the dominant standard for formal and quantifiable data privacy and has been used in major deployments that impact millions of people such as the 2020 US Decennial Census. Marginals (frequency distributions over subsets of attributes) are an important class of queries crucial for descriptive statistics, downstream analyses (e.g., regression modeling), and synthetic data generation. This thesis develops and analyzes principled, efficient, and scalable algorithms for answering marginal queries under differential privacy for discrete tabular data.
First, we study a key subproblem, reconstruction, that estimates marginals from noisy answers to other privately measured marginal queries. We show that marginals can be efficiently decomposed into a related class of queries called residuals and efficiently recombined to produce answers to additional marginal queries, a method we call GReM. Second, we propose an end-to-end private query answering mechanism, AIM+GReM, that combines intelligent selection of which marginals to measure, private measurement utilizing residual decomposition, and GReM-based scalable reconstruction. AIM+GReM achieves favorable running time-accuracy tradeoffs relative to state-of-the-art query answering mechanisms.
Third, we propose PwR, a mechanism for incorporating publicly known information directly into private marginal measurement. When the total count of records or some attributes are publicly known, PwR strictly improves accuracy over baseline approaches while maintaining privacy guarantees, favorably shifting the privacy-utility frontier for marginal measurement. Fourth, we study the problem of high-utility and scalable reconstruction of marginals from noisy measurements. We propose Local-MaxEnt, a reconstruction method that leverages the structure of residuals to bypass the batch optimization problem that is the bottleneck for existing reconstruction methods, achieving highly scalable, parallelizable reconstruction with favorable accuracy relative to existing methods.
Advisor:
Daniel Sheldon