• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模型展开的分层强化学习中的引导式合作

Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout.

作者信息

Wang Haoran, Tang Zeshen, Sun Yaoru, Wang Fang, Zhang Siyu, Chen Yeming

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8455-8469. doi: 10.1109/TNNLS.2024.3425809. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3425809
PMID:39133586
Abstract

Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.

摘要

目标条件分层强化学习(HRL)提出了一种很有前景的方法,可通过时间抽象在复杂的、长视野强化学习(RL)任务中实现有效的探索。从经验上看,增强的层间通信与协调能够在分层系统中带来更稳定、更强大的策略改进。然而,大多数现有的目标条件HRL算法主要集中在子目标发现上,而忽略了层间合作。在此,我们提出了一种名为基于模型展开的引导合作(GCMR;代码可在https://github.com/HaoranWang-TJ/GCMR_ACLG_official获取)的新型目标条件HRL框架,旨在通过利用前向动力学来弥合层间信息同步与合作。首先,GCMR通过基于模型的展开减轻策略外校正中的状态转移误差,从而提高样本效率。其次,为防止未见过的子目标和状态造成干扰,使用具有模型推断上限的梯度惩罚来约束较低层Q函数梯度,从而产生更稳定的行为策略,有利于进行有效的探索。第三,我们提出了一种基于一步展开的规划,使用较高层的评论家来指导较低层的策略。具体而言,我们使用较高层评论家函数估计较低层策略未来状态的值,从而将全局任务信息向下传递,以避免局部陷阱。GCMR中的这三个关键组件有望显著促进层间合作。实验结果表明,将所提出的GCMR框架与由地标引导的分层强化学习(HIGL)的解缠变体,即邻接约束和地标引导规划(ACLG)相结合,与各种基线相比,能产生更稳定、更强大的策略改进,并且显著优于先前的最优(SOTA)算法。

相似文献

1
Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout.基于模型展开的分层强化学习中的引导式合作
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8455-8469. doi: 10.1109/TNNLS.2024.3425809. Epub 2025 May 2.
2
HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning.HCPI-HRL:人类因果感知与推理驱动的分层强化学习
Neural Netw. 2025 Jul;187:107318. doi: 10.1016/j.neunet.2025.107318. Epub 2025 Mar 6.
3
Adjacency Constraint for Efficient Hierarchical Reinforcement Learning.用于高效分层强化学习的邻接约束
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4152-4166. doi: 10.1109/TPAMI.2022.3192418. Epub 2023 Mar 7.
4
End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery.具有集成子目标发现的端到端分层强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7778-7790. doi: 10.1109/TNNLS.2021.3087733. Epub 2022 Nov 30.
5
Reinforcement Learning From Hierarchical Critics.
IEEE Trans Neural Netw Learn Syst. 2023 Feb;34(2):1066-1073. doi: 10.1109/TNNLS.2021.3103642. Epub 2023 Feb 3.
6
Highly valued subgoal generation for efficient goal-conditioned reinforcement learning.用于高效目标条件强化学习的高价值子目标生成。
Neural Netw. 2025 Jan;181:106825. doi: 10.1016/j.neunet.2024.106825. Epub 2024 Oct 28.
7
Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation.基于高级模型近似的目标条件分层强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2705-2719. doi: 10.1109/TNNLS.2024.3354061. Epub 2025 Feb 6.
8
Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning.基于视觉的机器人导航,通过结合无监督学习和分层强化学习。
Sensors (Basel). 2019 Apr 1;19(7):1576. doi: 10.3390/s19071576.
9
Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.连续动作空间中的人在回路强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15735-15744. doi: 10.1109/TNNLS.2023.3289315. Epub 2024 Oct 29.
10
Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient.情景记忆 - 双智能体 - 评论家双延迟深度确定性策略梯度
Neural Netw. 2025 Jul;187:107286. doi: 10.1016/j.neunet.2025.107286. Epub 2025 Feb 27.