paperApril 2024

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

AuthorsJingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Yao Yao

We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fine-tuning, which enables efficient learning of the new modality distribution while maintaining the strong generalization ability of the large-scale pre-trained diffusion model. We demonstrate the effectiveness of JointNet by using RGBD diffusion as an example and through extensive experiments, showcasing its applicability in a variety of applications, including joint RGBD generation, dense depth prediction, depth-conditioned image generation, and coherent tile-based 3D panorama generation.

Figure 1: JointNet is capable of multiple downstream tasks.

Figure 2: Network architecture of JointNet and a comparison with other alternatives.

Related readings and updates.

Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency

July 11, 2025research area Methods and Algorithmsconference ICML

The adoption of text-to-image diffusion models raises concerns over reliability, drawing scrutiny under the lens of various metrics like calibration, fairness, or compute efficiency. We focus in this work on two issues that arise when deploying these models: a lack of diversity when prompting images, and a tendency to recreate images from the training set. To solve both problems, we propose a method that coaxes the sampled trajectories of…

Improving GFlowNets for Text-to-Image Diffusion Alignment

July 17, 2024research area Computer Vision, research area Methods and AlgorithmsWorkshop at ICML

This paper was accepted at the Foundation Models in the Wild workshop at ICML 2024.

Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained…

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

Related readings and updates.

Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency

Improving GFlowNets for Text-to-Image Diffusion Alignment

Discover opportunities in Machine Learning.