Checklists Are Better Than Reward Models For Aligning Language Models

AuthorsVijay Viswanathan†, Yanchao Sun, Shuang Ma‡**, Xiang Kong, Meng Cao, Graham Neubig†, Tongshuang Wu†

Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this — typically using fixed criteria such as “helpfulness” and “harmfulness”. In our work, we instead propose using flexible, instruction-specific criteria as a means of broadening the impact that reinforcement learning can have in eliciting instruction following. We propose “Reinforcement Learning from Checklist Feedback” (RLCF). From instructions, we extract checklists and evaluate how well responses satisfy each item - using both AI judges and specialized verifier programs - then combine these scores to compute rewards for RL. We compare RLCF with other alignment methods applied to a strong instruction following model (Qwen2.5-7B-Instruct) on five widely-studied benchmarks — RLCF is the only method to improve performance on every benchmark, including a 4-point boost in hard satisfaction rate on FollowBench, a 6-point increase on InFoBench, and a 3-point rise in win rate on Arena-Hard. These results establish checklist feedback as a key tool for improving language models’ support of queries that express a multitude of needs.

† Carnegie Mellon University
‡ Meta
** Work done while at Apple

Illustration of the RL from Checklist Feedback process, including checklist generation, scoring, and reward signal used in reinforcement learning. — Figure 1: We propose Reinforcement Learning from Checklist Feedback, where sampled responses are evaluated by a teacher model grounded on a fixed set of criteria. In our pipeline, given instructions, we first generate checklists synthetically from the instructions, grade each response on each checklist item, combine per-item scores into a single weighted checklist score, then use this score for RL.

Checklists Are Better Than Reward Models For Aligning Language Models

Related readings and updates.

Do LLMs Know Internally When They Follow Instructions?

Do LLMs Estimate Uncertainty Well in Instruction-Following?

Discover opportunities in Machine Learning.