Faculty Recruiting Support CICS

Improving Variational Inference Through Advanced Stochastic Optimization Techniques

30 Jun
Friday, 06/30/2023 10:00am to 12:00pm
CS 203
PhD Dissertation Proposal Defense
Speaker: Javier Burroni

Black-box variational inference (VI) is crucial in probabilistic machine learning, offering an alternative for performing Bayesian inference. By requiring only black-box access to the model and its gradients, it recasts complex inference tasks into more manageable optimization problems, aiding in approximating intricate posterior distributions across a wide range of models. However, black-box VI confronts a fundamental challenge: managing the noise introduced by using stochastic gradient optimization methods, which precludes efficient approximations. This thesis presents new approaches to enhance the efficiency of black-box VI by improving different aspects of its optimization process.

The first part of this thesis focuses on the importance-weighted evidence lower bound (IW-ELBO), an objective used for the VI optimization problem. The IW-ELBO, by incorporating importance sampling, augments the expressive power of the approximating distributions used in VI. However, it also introduces increased variance in gradient estimation, complicating the optimization process. To mitigate this, our thesis applies the theory of U-statistics, an approach that significantly reduces variance. Recognizing the potential impracticality of a comprehensive application of U-statistics, we introduce approximate methods that largely capture the variance reduction benefits without substantial computational overhead.

The second part of the thesis addresses a central issue within black-box VI: its stochastic optimization process is highly sensitive to user-specific hyperparameter choices, often leading to inconsistent results. We counter this by introducing an algorithm based on the sample average approximation (SAA), specifically tailored for VI. This method, SAA for VI, transforms the stochastic optimization task into a sequence of deterministic problems, easily solvable using standard optimization techniques. The resultant method simplifies and automates the optimization process, alleviating the burden of hyperparameter tuning, and exhibits robust performance, especially with complex statistical models involving hundreds of latent variables.

In the third part, we discuss the use of reparameterized distributions within black-box VI, specifically in the context of mixture distributions. Reparameterization, a key technique in VI, allows for effective gradient estimation, required for the optimization process. Despite its effectiveness, a canonical gradient estimator does not exist for mixture distributions. In response, our thesis introduces and evaluates four gradient estimation methods, two of which are novel. These are assessed on their variance, computational costs, and the breadth of their applicability. The insights gained from these evaluations extend beyond black-box VI, offering wider implications for the field.
 

Advisor: Dan Sheldon