• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于高效分层强化学习的邻接约束

Adjacency Constraint for Efficient Hierarchical Reinforcement Learning.

作者信息

Zhang Tianren, Guo Shangqi, Tan Tian, Hu Xiaolin, Chen Feng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4152-4166. doi: 10.1109/TPAMI.2022.3192418. Epub 2023 Mar 7.

DOI:10.1109/TPAMI.2022.3192418
PMID:35853052
Abstract

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this article, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a k-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.

摘要

目标条件分层强化学习(HRL)是一种很有前景的扩大强化学习(RL)技术规模的方法。然而,由于高层(即目标空间)的动作空间很大,它常常存在训练效率低下的问题。在大目标空间中搜索对高层子目标生成和低层策略学习都构成了困难。在本文中,我们表明,通过使用邻接约束将高层动作空间从整个目标空间限制到当前状态的k步相邻区域,可以有效缓解这个问题。我们从理论上证明,在确定性马尔可夫决策过程(MDP)中,所提出的邻接约束保留了最优分层策略,而在随机MDP中,邻接约束会导致由MDP的转移结构决定的有界状态值次优性。我们进一步表明,这种约束可以通过训练一个能够区分相邻和非相邻子目标的邻接网络来实际实现。在离散和连续控制任务(包括具有挑战性的模拟机器人运动和操作任务)上的实验结果表明,纳入邻接约束显著提高了当前最先进的目标条件HRL方法的性能。

相似文献

1
Adjacency Constraint for Efficient Hierarchical Reinforcement Learning.用于高效分层强化学习的邻接约束
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4152-4166. doi: 10.1109/TPAMI.2022.3192418. Epub 2023 Mar 7.
2
End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery.具有集成子目标发现的端到端分层强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7778-7790. doi: 10.1109/TNNLS.2021.3087733. Epub 2022 Nov 30.
3
Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout.基于模型展开的分层强化学习中的引导式合作
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8455-8469. doi: 10.1109/TNNLS.2024.3425809. Epub 2025 May 2.
4
Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning.基于视觉的机器人导航,通过结合无监督学习和分层强化学习。
Sensors (Basel). 2019 Apr 1;19(7):1576. doi: 10.3390/s19071576.
5
Human-in-the-Loop Reinforcement Learning in Continuous-Action Space.连续动作空间中的人在回路强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15735-15744. doi: 10.1109/TNNLS.2023.3289315. Epub 2024 Oct 29.
6
Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation.基于高级模型近似的目标条件分层强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2705-2719. doi: 10.1109/TNNLS.2024.3354061. Epub 2025 Feb 6.
7
Hierarchical approximate policy iteration with binary-tree state space decomposition.基于二叉树状态空间分解的分层近似策略迭代
IEEE Trans Neural Netw. 2011 Dec;22(12):1863-77. doi: 10.1109/TNN.2011.2168422. Epub 2011 Oct 10.
8
Improvement of Reinforcement Learning With Supermodularity.基于超模性的强化学习改进
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5298-5309. doi: 10.1109/TNNLS.2023.3244024. Epub 2023 Sep 1.
9
Cooperative modular reinforcement learning for large discrete action space problem.用于大离散动作空间问题的协作模块化强化学习
Neural Netw. 2023 Apr;161:281-296. doi: 10.1016/j.neunet.2023.01.046. Epub 2023 Feb 2.
10
Generative subgoal oriented multi-agent reinforcement learning through potential field.基于势场的面向生成子目标的多智能体强化学习。
Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.