Suppr超能文献

基于模型展开的分层强化学习中的引导式合作

Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout.

作者信息

Wang Haoran, Tang Zeshen, Sun Yaoru, Wang Fang, Zhang Siyu, Chen Yeming

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8455-8469. doi: 10.1109/TNNLS.2024.3425809. Epub 2025 May 2.

Abstract

Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.

摘要

目标条件分层强化学习(HRL)提出了一种很有前景的方法,可通过时间抽象在复杂的、长视野强化学习(RL)任务中实现有效的探索。从经验上看,增强的层间通信与协调能够在分层系统中带来更稳定、更强大的策略改进。然而,大多数现有的目标条件HRL算法主要集中在子目标发现上,而忽略了层间合作。在此,我们提出了一种名为基于模型展开的引导合作(GCMR;代码可在https://github.com/HaoranWang-TJ/GCMR_ACLG_official获取)的新型目标条件HRL框架,旨在通过利用前向动力学来弥合层间信息同步与合作。首先,GCMR通过基于模型的展开减轻策略外校正中的状态转移误差,从而提高样本效率。其次,为防止未见过的子目标和状态造成干扰,使用具有模型推断上限的梯度惩罚来约束较低层Q函数梯度,从而产生更稳定的行为策略,有利于进行有效的探索。第三,我们提出了一种基于一步展开的规划,使用较高层的评论家来指导较低层的策略。具体而言,我们使用较高层评论家函数估计较低层策略未来状态的值,从而将全局任务信息向下传递,以避免局部陷阱。GCMR中的这三个关键组件有望显著促进层间合作。实验结果表明,将所提出的GCMR框架与由地标引导的分层强化学习(HIGL)的解缠变体,即邻接约束和地标引导规划(ACLG)相结合,与各种基线相比,能产生更稳定、更强大的策略改进,并且显著优于先前的最优(SOTA)算法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验