View publication

Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based method that leverages a 3D shape dataset of an object class and learns to reproduce the distribution of appearances observed in real images by generating high-quality textures. In particular, our method does not require any 3D color supervision or correspondence between shape geometry and images to learn the texturing of 3D objects. Texturify operates directly on the surface of the 3D objects by introducing face convolutional operators on a hierarchical 4-RoSy parametrization to generate plausible object-specific textures. Employing differentiable rendering and adversarial losses that critique individual views and consistency across views, we effectively learn the high-quality surface texturing distribution from real-world images. Experiments on car and chair shape collections show that our approach outperforms state of the art by an average of 22% in FID score.

Related readings and updates.

Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

The perception system in personalized mobile agents requires developing indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc. Nonetheless, this direction has not been well-explored in comparison with models for outdoor environments (e.g., the autonomous driving system that includes pedestrian prediction, car detection, traffic sign recognition, etc.). In this paper, we first…
See paper details

High Fidelity 3D Reconstructions with Limited Physical Views

Multi-view triangulation is the gold standard for 3D reconstruction from 2D correspondences, given known calibration and sufficient views. However in practice expensive multi-view setups — involving tens sometimes hundreds of cameras — are required to obtain the high fidelity 3D reconstructions necessary for modern applications. In this work we present a novel approach that leverages recent advances in 2D-3D lifting using neural shape priors…
See paper details