Sharp Monocular View Synthesis in Less Than a Second

AuthorsLars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun

View publication

View source code (GitHub)

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.

Comparison showing SHARP generating a photorealistic 3D representation from a single input photograph, with the top image as the input and the bottom image as a synthesized novel view with fine details. — Figure 1: SHARP synthesizes a photorealistic 3D representation from a single photograph in less than a second. Top: Input image; Bottom: Novel view synthesized by SHARP. The synthesized representation supports high-resolution rendering of nearby views, with sharp details and fine structures, at more than 100 frames per second on a standard GPU.

Sharp Monocular View Synthesis in Less Than a Second

Related readings and updates.

HUGS: Human Gaussian Splats

Fast and Explicit Neural View Synthesis

Discover opportunities in Machine Learning.