PhD Dissertation Proposal: Brett Mullins, Practical Algorithms for Differentially Private Marginal Query Answering
Content
Speaker:
Brett Mullins
Abstract:
Differential privacy is the dominant standard for formal and quantifiable data privacy and has been used in major deployments that impact millions of people such as the 2020 US Decennial Census. Marginals (frequency distributions over subsets of attributes) are an important class of query crucial for descriptive statistics, downstream analyses (e.g., regression modeling), and synthetic data generation. As an example on a demographics dataset, the marginal over age group and education level counts the number of individuals for each combination of age group and level of education.
This thesis develops and analyzes principled, efficient, and scalable algorithms for answering marginal queries under differential privacy. First, we study a key subproblem, reconstruction, that estimates marginals from noisy answers to other marginal queries that have been measured privately. We show that marginals can be efficiently decomposed into a related class of queries called residuals and efficiently recombined to infer answers to additional marginal queries, a method we call GReM. Second, we propose an end-to-end private query answering mechanism, AIM+GReM, that combines intelligent selection of which marginal queries to measure with GReM-based scalable reconstruction. AIM+GReM achieves favorable running time-accuracy tradeoffs relative to state-of-the-art query answering mechanisms. Third, we return to the reconstruction subproblem and study reconstruction methods that impose consistency and non-negativity on the reconstructed marginals, trading off some scalability for lower error answers.
Advisor:
Daniel Sheldon