View publication

This paper was accepted at the "Human in the Loop Learning Workshop" at NeurIPS 2022.

Specification of reward functions for Reinforcement Learning is a challenging task which is bypassed by the framework of Preference Based Learning methods which instead learn from preference labels on trajectory queries. These methods, however, still suffer from high requirements of preference labels and often would still achieve low reward recovery. We present the PRIOR framework that alleviates the issues of impractical number of queries to humans as well as poor reward recovery through computing priors about the reward function based on the environment dynamics and a surrogate preference classification model. We find that imposing these priors as soft constraints significantly reduces the queries made to the human in the loop and improves the overall reward recovery. Additionally, we investigate the use of an abstract state space for the computation of these priors to further improve the agent's performance.

Related readings and updates.

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent's trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards…
See paper details

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

This paper was accepted at the workshop at "Human-in-the-Loop Learning Workshop" at NeurIPS 2022. Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In this work, we demonstrate that encoding environment…
See paper details