PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories
AuthorsStéphane Aroca-Ouellette†**, Natalie Mackraz, Barry-John Theobald, Katherine Metcalf
PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories
AuthorsStéphane Aroca-Ouellette†**, Natalie Mackraz, Barry-John Theobald, Katherine Metcalf
Accommodating human preferences is essential for creating AI agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs to infer preferences from user interactions, but they often produce broad and generic preferences, failing to capture the unique and individualized nature of human preferences. This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring preferences. PREDICT incorporates three key elements: (1) iterative refinement of inferred preferences, (2) decomposition of preferences into constituent components, and (3) validation of preferences across multiple trajectories. We evaluate PREDICT on two distinct environments: a gridworld setting and a new text-domain environment (PLUME). PREDICT more accurately infers nuanced human preferences improving over existing baselines by 66.2% (gridworld environment) and 41.0% (PLUME).
Aligning LLMs by Predicting Preferences from User Writing Samples
June 27, 2025research area Human-Computer Interaction, research area Methods and Algorithmsconference ICML
Accommodating human preferences is essential for creating aligned LLM agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs acting as writing agents to infer a description of user preferences. Agent alignment then comes from conditioning on the inferred preference description. However, existing methods often produce generic preference descriptions that fail to capture the unique and…
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
October 23, 2024research area Data Science and Annotation, research area Methods and Algorithmsconference NeurIPS
The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). While new preference datasets are being introduced…