View publication

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.



Highlighted Qualitative Results


Related readings and updates.

Learning to Generate Radiance Fields of Indoor Scenes

People have an innate capability to understand the 3D visual world and make predictions about how the world could look from different points of view, even when relying on few visual observations. We have this spatial reasoning ability because of the rich mental models of the visual world we develop over time. These mental models can be interpreted as a prior belief over which configurations of the visual world are most likely to be observed. In this case, a prior is a probability distribution over the 3D visual world.

See highlight details

RetrievalFuse: Neural 3D Scene Reconstruction with a Database

3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scene geometry from the training database. First, we…
See paper details