This website requires JavaScript.

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Mianchu WangYue JinGiovanni Montana
Mar 2023
摘要
Offline reinforcement learning (RL) aims to infer sequential decisionpolicies using only offline datasets. This is a particularly difficult setup,especially when learning to achieve multiple different goals or outcomes undera given scenario with only sparse rewards. For offline learning ofgoal-conditioned policies via supervised learning, previous work has shown thatan advantage weighted log-likelihood loss guarantees monotonic policyimprovement. In this work we argue that, despite its benefits, this approach isstill insufficient to fully address the distribution shift and multi-modalityproblems. The latter is particularly severe in long-horizon tasks where findinga unique and optimal policy that goes from a state to the desired goal ischallenging as there may be multiple and potentially conflicting solutions. Totackle these challenges, we propose a complementary advantage-based weightingscheme that introduces an additional source of inductive bias: given avalue-based partitioning of the state space, the contribution of actionsexpected to lead to target regions that are easier to reach, compared to thefinal goal, is further increased. Empirically, we demonstrate that the proposedapproach, Dual-Advantage Weighted Offline Goal-conditioned RL (DAWOG),outperforms several competing offline algorithms in commonly used benchmarks.Analytically, we offer a guarantee that the learnt policy is never worse thanthe underlying behaviour policy.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答