• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过不断更新边界目标逐步学习以实现远程目标。

Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals.

作者信息

Shao Mengxuan, Zhu Haiqi, Zhao Debin, Han Kun, Jiang Feng, Liu Shaohui, Zhang Wei

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9238-9252. doi: 10.1109/TNNLS.2024.3428323. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3428323
PMID:39302800
Abstract

Training an effective policy on complex goal-reaching tasks with sparse rewards is an open challenge. It is more difficult for the task of reaching remote goals (RRG), as the unavailability of the original rewards and large Wasserstein distance between the distributions of desired goals and initial states make existing methods for common goal-reaching tasks inefficient or even completely ineffective. In this article, we propose progressively learning to reach remote goals by continuously updating boundary goals (PLUB), which solves RRG tasks by reducing the Wasserstein distance between the distributions of boundary goals and desired goals. Specifically, the concept of boundary goal is introduced, which is the set of the closest achieved goals for each desired goal. In addition, to reduce the computational complexity caused by the Wasserstein distance, the closest moving distance is introduced, which is its upper bound, and also the expectation of the distance between the desired goal and the closest boundary goal. By selecting the appropriate intermediate goal from all boundary goals and continuously updating boundary goals, both the closest moving distance and the Wasserstein distance can be reduced. As a result, RRG tasks degenerate into common goal-reaching tasks that can be efficiently solved by a combination of hindsight relabeling and the learning from demonstrations (LfD) method. Extensive experiments on several robotic manipulation tasks demonstrate that PLUB can bring substantial improvements over the existing methods.

摘要

在具有稀疏奖励的复杂目标达成任务上训练有效的策略是一个开放性挑战。对于达成远程目标(RRG)的任务而言更具难度,因为原始奖励不可用,且期望目标分布与初始状态之间存在较大的瓦瑟斯坦距离,使得用于常见目标达成任务的现有方法效率低下甚至完全无效。在本文中,我们提出通过持续更新边界目标来逐步学习达成远程目标(PLUB),它通过减小边界目标分布与期望目标分布之间的瓦瑟斯坦距离来解决RRG任务。具体而言,引入了边界目标的概念,它是每个期望目标的最接近已达成目标的集合。此外,为了降低由瓦瑟斯坦距离引起的计算复杂度,引入了最接近移动距离,它是瓦瑟斯坦距离的上界,也是期望目标与最接近边界目标之间距离的期望。通过从所有边界目标中选择合适的中间目标并持续更新边界目标,最接近移动距离和瓦瑟斯坦距离都可以减小。结果,RRG任务退化为可以通过事后重标记和示范学习(LfD)方法的组合有效解决的常见目标达成任务。在多个机器人操纵任务上的大量实验表明,PLUB相较于现有方法能带来显著改进。

相似文献

1
Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals.通过不断更新边界目标逐步学习以实现远程目标。
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9238-9252. doi: 10.1109/TNNLS.2024.3428323. Epub 2025 May 2.
2
Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation.通过基于图的事后诸葛亮目标生成实现复杂机器人操作
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7863-7876. doi: 10.1109/TNNLS.2021.3088947. Epub 2022 Nov 30.
3
Robotic Manipulation in Dynamic Scenarios via Bounding-Box-Based Hindsight Goal Generation.通过基于边界框的事后目标生成实现动态场景中的机器人操作。
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5037-5050. doi: 10.1109/TNNLS.2021.3124366. Epub 2023 Aug 4.
4
Highly valued subgoal generation for efficient goal-conditioned reinforcement learning.用于高效目标条件强化学习的高价值子目标生成。
Neural Netw. 2025 Jan;181:106825. doi: 10.1016/j.neunet.2024.106825. Epub 2024 Oct 28.
5
Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning.通过目标条件强化学习中的信息瓶颈进行紧凑目标表示学习
IEEE Trans Neural Netw Learn Syst. 2024 Jan 8;PP. doi: 10.1109/TNNLS.2023.3344880.
6
Curriculum learning with Hindsight Experience Replay for sequential object manipulation tasks.基于后见之明经验回放的序贯物体操作任务课程学习。
Neural Netw. 2022 Jan;145:260-270. doi: 10.1016/j.neunet.2021.10.011. Epub 2021 Oct 21.
7
Aggregated Wasserstein Distance and State Registration for Hidden Markov Models.隐马尔可夫模型的聚合瓦瑟斯坦距离与状态配准
IEEE Trans Pattern Anal Mach Intell. 2020 Sep;42(9):2133-2147. doi: 10.1109/TPAMI.2019.2908635. Epub 2019 Apr 1.
8
Infants rationally infer the goals of other people's reaches in the absence of first-person experience with reaching actions.婴儿在没有第一人称伸手经验的情况下,能够理性推断他人伸手动作的目的。
Dev Sci. 2024 May;27(3):e13453. doi: 10.1111/desc.13453. Epub 2023 Nov 5.
9
Addressing Hindsight Bias in Multigoal Reinforcement Learning.解决多目标强化学习中的事后诸葛亮偏差
IEEE Trans Cybern. 2023 Jan;53(1):392-405. doi: 10.1109/TCYB.2021.3107202. Epub 2022 Dec 23.
10
Clustering-based Failed goal Aware Hindsight Experience Replay.基于聚类的失败目标感知事后经验回放
PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.