PhD Dissertation Proposal: Fabien Delattre, Diffusion Priors for Inverse Problems in Computer Vision
Content
Speaker:
Abstract:
Progress in generative models has surged in recent years. In particular, diffusion models have proven to be able to capture complex distributions such as images, videos, and 3D data. Using these models as priors can solve a wide range of inverse problems, including image inpainting, deblurring, and super-resolution. Unlike discriminative models, which must be trained separately for each individual task, generative models enable inference-time optimization subject to observational and domain constraints. In this thesis, we explore how recent progress in generative modeling improves solutions to inverse problems.
First, we revisit activation maximization for feature visualization and cast it as an inverse problem. We view the classifier as the forward map from images to class scores. Accordingly, the inverse problem is to recover inputs on the natural-image manifold that achieve a target activation, revealing what the classifier has learned without access to training data or class names. Although the literature on activation maximization is vast, current methods still yield fooling artifacts or blurry class averages. We combine a pretrained text-conditioned diffusion prior with gradient-based guidance from the target classifier. We hypothesize that the set of images belonging to a class can be described concisely in natural language, and therefore, we directly optimize token embeddings that condition the diffusion model. This drives the model to synthesize images that maximize the classifier’s activation for the target class while remaining likely under the diffusion prior. Unlike current classifier-guidance methods, our method does not require the classifier to be trained at different noise levels.
Second, we propose to leverage diffusion models for inpainting categorical fields. Our objective is to reconstruct semantic maps, such as land cover or segmentation masks, from partial or noisy observations. We focus on an ecological application where river networks in aerial and satellite imagery are frequently occluded by clouds. Instead of operating in RGB space, we directly model the distribution of segmentation labels using a discrete diffusion process. Conditioning on visible labels, posterior sampling then produces plausible amodal segmentations that recover river continuity across occluded regions.
Advisor:
Erik Learned-Miller