View publication

We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks.

Related readings and updates.

Apple at ICML 2020

Apple sponsored the thirty-seventh International Conference on Machine Learning (ICML), which was held virtually from July 12 to 18. ICML is a leading global gathering dedicated to advancing the machine learning field.

See event details

Variational Saccading: Efficient Inference for Large Resolution Images

Image classification with deep neural networks is typically restricted to images of small dimensionality such as R224 x 244 in Resnet models [24]. This limitation excludes the R4000 x 3000 dimensional images that are taken by modern smartphone cameras and smart devices. In this work, we aim to mitigate the prohibitive inferential and memory costs of operating in such large dimensional spaces. To sample from the high-resolution original input…
See paper details