View publication

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive, and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion, and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.

Related readings and updates.

Improving GFlowNets for Text-to-Image Diffusion Alignment

This paper was accepted at the Foundation Models in the Wild workshop at ICML 2024. Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained diffusion…
See paper details

Enhancing Paragraph Generation with a Latent Language Diffusion Model

In the fast-evolving world of natural language processing (NLP), there is a strong demand for generating coherent and controlled text, as referenced in the work Toward Controlled Generation of Text. Traditional autoregressive models such as GPT, which have long been the industry standard, possess inherent limitations that sometimes manifest as repetitive and low-quality outputs, as seen in the work The Curious Case of Neural Text Degeneration. This is primarily due to a phenomenon known as "exposure bias," as seen in the work Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. This imperfection arises due to a mismatch between how these models are trained and their actual use during inference, often leading to error accumulation during text generation.

See highlight details