View publication

While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (e.g., greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a self-reinforcement effect: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method DITTO (PseuDo-RepetITion PenalizaTiOn), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data. Although our method is motivated by mitigating repetitions, our experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.

Related readings and updates.

Enhancing Paragraph Generation with a Latent Language Diffusion Model

In the fast-evolving world of natural language processing (NLP), there is a strong demand for generating coherent and controlled text, as referenced in the work Toward Controlled Generation of Text. Traditional autoregressive models such as GPT, which have long been the industry standard, possess inherent limitations that sometimes manifest as repetitive and low-quality outputs, as seen in the work The Curious Case of Neural Text Degeneration. This is primarily due to a phenomenon known as "exposure bias," as seen in the work Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. This imperfection arises due to a mismatch between how these models are trained and their actual use during inference, often leading to error accumulation during text generation.

See highlight details

Plan-then-Generate: Controlled Data-to-Text

Recent developments in neural networks have led to the advance in data-to-text generation. However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications. In this study, we propose a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models. Extensive experiments and analyses are conducted on two benchmark datasets, ToTTo…
See paper details