PhD Dissertation Proposal: Jinlin Lai, Efficient Bayesian Inference with Automatic Marginalization : Manning College of Information & Computer Sciences : UMass Amherst

Monday, November 17, 2025, 10:00 AM - Monday, November 17, 2025, 12:00 PM

E239

Computer Science Laboratories

Hybrid

PhD Dissertation Proposal Defense

Presentation

Speaker

Jinlin Lai

Abstract

Bayesian methods incorporate prior knowledge into statistical modeling. There are two steps in practical Bayesian inference: modeling and inference. Domain experts encode their knowledge into probabilistic models, and use an algorithm to infer the (posterior) distribution of unobserved variables given the observed ones. To reduce user effort, probabilistic programming languages (PPLs) are designed to automate inference for arbitrary models with general-purpose algorithms. However, the posterior distributions of real-world models can be difficult to reason about, complicating the inference with those algorithms. There is a consensus that statistical algorithms work better for lower-dimensional problems, and Bayesian methods perform better for posterior distributions with better geometries. Marginalization is a family of methods for integrating variables out of statistical models, which not only reduces the problem dimension, but also simplifies the geometry of the posterior, thereby leading to better posterior inference. We study the principles of marginalization in the context of modern Bayesian inference, and propose automatic marginalization pipelines in PPLs that are efficient for applied models.

In the first part of the thesis, we study methods for automatic marginalization for graphical models within PPLs. At a high level, we automatically transform users' probabilistic programs so that they can be more efficient for downstream inference algorithms. In the back-end, probabilistic programs are compiled into basic operations, which can be transformed into forward computation graphs for the models. At this level of abstraction, we automatically detect conjugacy relationships between random variables, and apply edge reversal operations to marginalize variables out from the graphical models. Finally, the transformed computation graphs are used for inference with Hamiltonian Monte Carlo (HMC). We show that automatic marginalization improves the inference efficiency of hierarchical Bayesian models.

The second part of the thesis focuses on efficient posterior inference for linear mixed-effects models (LMMs). LMMs are regression models that have both fixed and random effects, and are widely used across different scientific disciplines, including ecology, medicine, psychology, neuroscience and cognitive science. We utilize sparse structures and fast linear algebra methods to perform vectorized marginalization for LMMs, which scales to large datasets and runs on modern GPUs, improving over naive automatic marginalization. The algorithm is implemented as a PPL library function that can be called directly by users. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

The third part of the thesis is about correcting the bias of adjoint Laplace approximation (LA) in Bayesian inference. In many practical models, no conjugacy relationship exists among the variables, preventing exact marginalization. Previous works propose to use LA as an approximate method to marginalize latent variables in latent Gaussian models (LGMs) with a Gaussian approximation. In practice, LA biases the original model, and the bias will propagate to any later computations. We propose to correct the bias in two ways. First, we can define a pseudo-model that shares the same marginal as the original model by constructing an importance sampling based estimator with LA. Second, we can generalize LA with quasi-Monte Carlo series that make approximate marginalization asymptotically unbiased. We will explore the two ideas and the combination of them on practical LGMs.

This thesis explores automatic marginalization at different abstraction levels of probabilistic programming. The methods could be utilized to design more efficient and more automatic probabilistic programming languages, facilitating future applications of Bayesian methods.

Advisor

Dan Sheldon

Hybrid event posted in PhD Dissertation Proposal Defense for Faculty and Current students

PhD Dissertation Proposal: Jinlin Lai, Efficient Bayesian Inference with Automatic Marginalization

Content

Computer Science Laboratories

Global footer