Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
AuthorsYue Wu, Shuangfei Zhai, Nitish Srivastava, Josh Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
AuthorsYue Wu, Shuangfei Zhai, Nitish Srivastava, Josh Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that models the epistemic uncertainty to detect OOD state action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.
Apple sponsored the Neural Information Processing Systems (NeurIPS) conference, which was held virtually from December 6 to 12. NeurIPS is a global conference focused on fostering the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.