View publication

Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than 2.3× over the original 3DGS loss, and 1.5× over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters 1.8× and 3.6×, respectively. We also find that this carries over to the task of 3DGS scene compression, with ≈50% bitrate savings for comparable perceptual metric performance.

Diagram of 3D Gaussian Splatting representation and compression frameworks showing optimization using 2D distortion and rate-distortion objectives with perceptual loss components.

Figure 1: 3DGS representation and compression frameworks optimized using 2D distortion and rate-distortion objectives, incorporating perceptual losses as part of the training framework.

Chart showing Bayesian Elo scores comparing 3D Gaussian Splatting representation methods across indoor, outdoor, and combined scene benchmarks, with WD-R and WD achieving the highest scores.

Figure 2: Bayesian Elo scores for 3DGS representation methods across indoor scenes (Deep Blending, Mip-NeRF 360 indoor), outdoor scenes (Tanks & Temples, Mip-NeRF 360 outdoor, and BungeeNeRF), and all scenes combined. WD-R and WD achieve the highest scores in all settings (within the 95% confidence interval).

Related readings and updates.

Generating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion…

Read more

Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting…

Read more