Faculty Recruiting Support CICS

Machine Learning and Friends Lunch (Online)

24 Mar
Thursday, 03/24/2022 12:00pm to 1:00pm
Virtual via Zoom
Machine Learning and Friends Lunch

Title: Learning Compositional Representations for Understanding and Generating 3D Environments with Minimal Supervision and Maximum Controllability

Abstract: Within the first year of our life, we develop a common-sense understanding of the physical behavior of the world, which relies heavily on our ability to properly reason about the arrangement of objects in a scene. While this seems to be a fairly easy task for the human brain, computer vision algorithms struggle to form such high-level reasoning. Therefore, the research community shifted their attention to the development of primitive-based methods that seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts.

In the first part of my talk, I will address the trade-off between reconstruction quality and the number of parts and present Neural Parts, a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) which implements homeomorphic mappings between a sphere and the target object. Since a homeomorphism does not impose any constraints on the primitive shape, our model effectively decouples geometric accuracy from parsimony and as a result, captures complex geometries with an order of magnitude fewer primitives. In the second part of my talk, we will look into the problem of inferring and subsequently also generating semantically meaningful object arrangements to populate 3D scenes conditioned on the room shape. In particular, I will present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments as unordered sets of objects. Our unordered set formulation allows us to use the same trained model for a variety of interactive applications such as general scene completion, partial room rearrangement with any objects specified by the user, as well as object suggestions for any partial room. This is an important step towards fully automatic content creation.

Bio: Despoina Paschalidou is a PostDoc at Stanford University working with Prof. Leo Guibas at the Geometric Computation Group. Prior to this, she did her PhD at the Max Planck Institute for Intelligent Systems i n Tubingen and the Computer Vision Lab in ETH Zurich, under the guidance of Prof. Andreas Geiger and Prof. Luc van Gool. She received her Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, in 2015. Her research interests revolve around semantic and interpretable representations of 3D objects and scenes. She spent 1 year working with Prof. Sanja Fidler at NVIDIA Research on developing interactive tools for content creation. Moreover, she spent 6 months at FAIR working with Prof. Andrea Vedaldi and David Novotny on unsupervised 3D reconstruction from video data.

To obtain the Zoom link for this event, please see the event announcements from MLFL on the college email lists or contact wenlongzhao [at] cs.umass.edu (subject: MLFL%20Zoom%20Link) (Wenlong Zhao).

Host
: