Content

Speaker

Pratheba Selvaraju

Abstract

3D reconstruction from real-world data is essential in applications like augmented reality, robotics, medical imaging, and autonomous navigation. However, this data is often noisy, incomplete, occluded, or corrupted. Despite these imperfections, utilizing this data is necessary to develop reconstruction methods that can be applied in real-world and real-time scenarios. Additionally, each application has its own requirements and constraints, and achieving the best possible outcome depends on selecting suitable representations catered to each case. Given the wide range of applications, grouping them based on specific characteristics—such as rigid or deformable objects–allows for a targeted approach applicable to similar scenarios. Building on this categorization, this thesis addresses reconstruction tasks for static (rigid) and dynamic (non-rgid) structures, exploring representations best suited to each.

We begin by focusing on static structure reconstruction for urban planning and development, which primarily deals with non-malleable material constraints. To address this, we introduce Developability Approximation for Neural Implicits through Rank Minimization, a neural network model that represents surfaces as piecewise zero-gaussian curvature patches. The model encodes data implicitly, offering an advantage over prior explicit methods that struggle with high tessellation and shape fidelity. To extend this single surface reconstruction method for multi-component urban building planning, we created a large-scale dataset of 2,000 diverse building exteriors (e.g., residential, commercial, stadium) named BuildingNet: Learning to Label 3D Buildings. Using this dataset, we developed a Graph Neural Network (GNN) model to label building components. These labeled components can leverage the developability approximation method, which can be applied to specific parts, to simulate and evaluate designs, costs, and feasibility for planning

Next, we explore dynamic object reconstruction, focusing on human faces with real-world applications in forensic science, medical imaging, animation, and telepresence, by introducing OFER: Occluded Face Expression Reconstruction. OFER reconstructs expressive human faces from occluded images. It employs a face parametric model that encodes facial features, enabling smooth reconstruction and easy animatability by adjusting the model parameters. This is achieved by training UNet-based diffusion models to generate varied expression parameters for the occluded regions.

In facial animation, real-time performance is crucial for applications like gaming and augmented reality, which require computational efficiency while preserving high quality. Traditional UNet-based diffusion models often suffer from slower inference times. To tackle this, we explore efficient computational representations and introduce FORA: Fast-Forward Caching for Diffusion Transformer Acceleration. FORA employs a caching mechanism that reuses intermediate outputs, thereby minimizing computational overhead without requiring model retraining enabling faster processing with minimal trade-offs in quality.

Advisor

Erik Learned-Miller

Hybrid event posted in PhD Thesis Defense