Classifier-Free Guidance Is a Predictor-Corrector

AuthorsArwen Bradley, Preetum Nakkiran

This paper was accepted at the Mathematics of Modern Machine Learning (M3L) Workshop at NeurIPS 2024.

We investigate the unreasonable effectiveness of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM and DDIM, and neither sampler with CFG generates the gamma-powered distribution. Then, we clarify the behavior of CFG by showing that it is a kind of Predictor-Corrector (PC) method that alternates between denoising and sharpening, which we call Predictor-Corrector Guidance (PCG). We show that in the SDE limit, DDPM-CFG is equivalent to PCG with a DDIM predictor applied to the conditional distribution, and Langevin dynamics corrector applied to a gamma-powered distribution. While the standard PC corrector applies to the conditional distribution and improves sampling accuracy, our corrector sharpens the distribution.

Figure 1: Classifier-free guidance has become an essential part of modern text-to-image diffusion generation, but is still poorly understood. We prove that CFG is a kind of predictor-corrector that alternates between denoising & sharpening — i.e., an annealed Langevin dynamics on sharpened distributions.

Classifier-Free Guidance Is a Predictor-Corrector

Related readings and updates.

Classifier-Free Guidance is a Predictor-Corrector

When Does a Predictor Know Its Own Loss?

Discover opportunities in Machine Learning.