• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类的失败目标感知事后经验回放

Clustering-based Failed goal Aware Hindsight Experience Replay.

作者信息

Kim Taeyoung, Kang Taemin, Jeong Haechan, Har Dongsoo

机构信息

CCS Graduate School of Mobility, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.

The Robotics Program, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.

出版信息

PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.

DOI:10.7717/peerj-cs.2588
PMID:39896403
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784800/
Abstract

In a multi-goal reinforcement learning environment, an agent learns a policy to perform tasks with multiple goals from experiences gained through exploration. In environments with sparse binary rewards, the replay buffer contains few successful experiences, posing a challenge for sampling efficiency. To address this, Hindsight Experience Replay (HER) generates successful experiences, named hindsight experiences, from unsuccessful ones. However, uniform sampling of experiences for the process of HER can lead to inefficient scenarios of generating hindsight experience. In this paper, a novel method called Failed goal Aware HER (FAHER) is proposed to enhance sampling efficiency. This method considers the properties of achieved goals with respect to failed goals during sampling. To account for these properties, a cluster model is used to cluster episodes in the replay buffer, and experiences are subsequently sampled in the manner of HER. The proposed method is validated through experiments on three robotic control tasks from the OpenAI Gym. The experimental results demonstrate that the proposed method is more sample-efficient and achieves improved performance over baseline approaches.

摘要

在多目标强化学习环境中,智能体从探索中获得的经验里学习执行多目标任务的策略。在具有稀疏二元奖励的环境中,重放缓冲区包含的成功经验很少,这对采样效率构成了挑战。为了解决这个问题,indsight经验重放(HER)从不成功的经验中生成成功经验,即indsight经验。然而,在HER过程中对经验进行均匀采样可能会导致生成indsight经验的低效情况。本文提出了一种名为失败目标感知HER(FAHER)的新方法来提高采样效率。该方法在采样时考虑已实现目标相对于失败目标的属性。为了考虑这些属性,使用聚类模型对重放缓冲区中的情节进行聚类,随后按照HER的方式对经验进行采样。通过在OpenAI Gym的三个机器人控制任务上进行实验,对所提出的方法进行了验证。实验结果表明,所提出的方法具有更高的采样效率,并且与基线方法相比性能有所提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47d3/11784800/fcfad8b9e5ab/peerj-cs-10-2588-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47d3/11784800/fcfad8b9e5ab/peerj-cs-10-2588-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47d3/11784800/fcfad8b9e5ab/peerj-cs-10-2588-g003.jpg

相似文献

1
Clustering-based Failed goal Aware Hindsight Experience Replay.基于聚类的失败目标感知事后经验回放
PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.
2
Sampling Rate Decay in Hindsight Experience Replay for Robot Control.事后经验回放中机器人控制的采样率衰减。
IEEE Trans Cybern. 2022 Mar;52(3):1515-1526. doi: 10.1109/TCYB.2020.2990722. Epub 2022 Mar 11.
3
AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control.AHEGC:用于机器人控制的带目标修正好奇心模块的自适应后见经验回放
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16602-16615. doi: 10.1109/TNNLS.2023.3296765. Epub 2024 Oct 29.
4
Addressing Hindsight Bias in Multigoal Reinforcement Learning.解决多目标强化学习中的事后诸葛亮偏差
IEEE Trans Cybern. 2023 Jan;53(1):392-405. doi: 10.1109/TCYB.2021.3107202. Epub 2022 Dec 23.
5
Highly valued subgoal generation for efficient goal-conditioned reinforcement learning.用于高效目标条件强化学习的高价值子目标生成。
Neural Netw. 2025 Jan;181:106825. doi: 10.1016/j.neunet.2024.106825. Epub 2024 Oct 28.
6
Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation.通过基于图的事后诸葛亮目标生成实现复杂机器人操作
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7863-7876. doi: 10.1109/TNNLS.2021.3088947. Epub 2022 Nov 30.
7
Robotic Manipulation in Dynamic Scenarios via Bounding-Box-Based Hindsight Goal Generation.通过基于边界框的事后目标生成实现动态场景中的机器人操作。
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5037-5050. doi: 10.1109/TNNLS.2021.3124366. Epub 2023 Aug 4.
8
Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience Replay.使用深度强化学习的多臂机械臂路径规划:带有后见之明经验回放的软动作-评论家。
Sensors (Basel). 2020 Oct 19;20(20):5911. doi: 10.3390/s20205911.
9
Autonomous Driving of Mobile Robots in Dynamic Environments Based on Deep Deterministic Policy Gradient: Reward Shaping and Hindsight Experience Replay.基于深度确定性策略梯度的动态环境中移动机器人自主驾驶:奖励塑造与事后经验回放
Biomimetics (Basel). 2024 Jan 13;9(1):0. doi: 10.3390/biomimetics9010051.
10
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.

本文引用的文献

1
Tuning path tracking controllers for autonomous cars using reinforcement learning.使用强化学习调整自动驾驶汽车的路径跟踪控制器。
PeerJ Comput Sci. 2023 Oct 19;9:e1550. doi: 10.7717/peerj-cs.1550. eCollection 2023.
2
Mastering the game of Stratego with model-free multiagent reinforcement learning.运用无模型多智能体强化学习掌握 Stratego 游戏。
Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.
3
Heterogeneous mission planning for a single unmanned aerial vehicle (UAV) with attention-based deep reinforcement learning.
基于注意力的深度强化学习的单架无人机异构任务规划
PeerJ Comput Sci. 2022 Oct 17;8:e1119. doi: 10.7717/peerj-cs.1119. eCollection 2022.
4
Two-stage training algorithm for AI robot soccer.人工智能机器人足球的两阶段训练算法。
PeerJ Comput Sci. 2021 Sep 17;7:e718. doi: 10.7717/peerj-cs.718. eCollection 2021.
5
Reactive navigation under a fuzzy rules-based scheme and reinforcement learning for mobile robots.基于模糊规则的移动机器人反应式导航与强化学习
PeerJ Comput Sci. 2021 Jun 4;7:e556. doi: 10.7717/peerj-cs.556. eCollection 2021.
6
Sampling Rate Decay in Hindsight Experience Replay for Robot Control.事后经验回放中机器人控制的采样率衰减。
IEEE Trans Cybern. 2022 Mar;52(3):1515-1526. doi: 10.1109/TCYB.2020.2990722. Epub 2022 Mar 11.
7
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
8
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.