通过不断更新边界目标逐步学习以实现远程目标。

Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals.

作者信息

Shao Mengxuan, Zhu Haiqi, Zhao Debin, Han Kun, Jiang Feng, Liu Shaohui, Zhang Wei

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9238-9252. doi: 10.1109/TNNLS.2024.3428323. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3428323

Abstract

Training an effective policy on complex goal-reaching tasks with sparse rewards is an open challenge. It is more difficult for the task of reaching remote goals (RRG), as the unavailability of the original rewards and large Wasserstein distance between the distributions of desired goals and initial states make existing methods for common goal-reaching tasks inefficient or even completely ineffective. In this article, we propose progressively learning to reach remote goals by continuously updating boundary goals (PLUB), which solves RRG tasks by reducing the Wasserstein distance between the distributions of boundary goals and desired goals. Specifically, the concept of boundary goal is introduced, which is the set of the closest achieved goals for each desired goal. In addition, to reduce the computational complexity caused by the Wasserstein distance, the closest moving distance is introduced, which is its upper bound, and also the expectation of the distance between the desired goal and the closest boundary goal. By selecting the appropriate intermediate goal from all boundary goals and continuously updating boundary goals, both the closest moving distance and the Wasserstein distance can be reduced. As a result, RRG tasks degenerate into common goal-reaching tasks that can be efficiently solved by a combination of hindsight relabeling and the learning from demonstrations (LfD) method. Extensive experiments on several robotic manipulation tasks demonstrate that PLUB can bring substantial improvements over the existing methods.

摘要

在具有稀疏奖励的复杂目标达成任务上训练有效的策略是一个开放性挑战。对于达成远程目标（RRG）的任务而言更具难度，因为原始奖励不可用，且期望目标分布与初始状态之间存在较大的瓦瑟斯坦距离，使得用于常见目标达成任务的现有方法效率低下甚至完全无效。在本文中，我们提出通过持续更新边界目标来逐步学习达成远程目标（PLUB），它通过减小边界目标分布与期望目标分布之间的瓦瑟斯坦距离来解决RRG任务。具体而言，引入了边界目标的概念，它是每个期望目标的最接近已达成目标的集合。此外，为了降低由瓦瑟斯坦距离引起的计算复杂度，引入了最接近移动距离，它是瓦瑟斯坦距离的上界，也是期望目标与最接近边界目标之间距离的期望。通过从所有边界目标中选择合适的中间目标并持续更新边界目标，最接近移动距离和瓦瑟斯坦距离都可以减小。结果，RRG任务退化为可以通过事后重标记和示范学习（LfD）方法的组合有效解决的常见目标达成任务。在多个机器人操纵任务上的大量实验表明，PLUB相较于现有方法能带来显著改进。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过不断更新边界目标逐步学习以实现远程目标。

Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals.

作者信息

出版信息

相似文献

通过不断更新边界目标逐步学习以实现远程目标。

Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals.

作者信息

出版信息

相似文献