• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AHEGC:用于机器人控制的带目标修正好奇心模块的自适应后见经验回放

AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control.

作者信息

Zeng Hongliang, Zhang Ping, Li Fang, Lin Chubin, Zhou Junkang

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16602-16615. doi: 10.1109/TNNLS.2023.3296765. Epub 2024 Oct 29.

DOI:10.1109/TNNLS.2023.3296765
PMID:37527323
Abstract

With shaped reward functions, reinforcement learning (RL) has recently been successfully applied to several robot control tasks. However, designing a task-relevant and well-performing reward function takes time and effort. Still, if RL can train an agent to complete a task in a sparse reward environment, it is an effective way to address the difficulty of reward function design, but it is still a significant challenge. To address this issue, the pioneering hindsight experience replay (HER) method dramatically enhances the probability of acquiring skills in sparse reward environments by transforming unsuccessful experiences into helpful training samples. However, HER still requires a lengthy training period. In this article, we propose a new technique based on HER termed adaptive HER with goal-amended curiosity module (AHEGC) for further enhancing sample and exploration efficiency. Specifically, an adaptive adjustment strategy of hindsight experience (HE) sampling rate and reward weights is developed to enhance sample efficiency. Furthermore, we introduce a curiosity mechanism to encourage more efficient exploration of the environment and propose a goal-amended (GA) curiosity module as a solution to the problem of over-seeking novelty caused by the curiosity introduced. We conducted experiments on six demanding robot control tasks with binary rewards, including Fetch and Hand environments. The results show that the proposed method outperforms existing methods regarding learning ability and convergence speed.

摘要

借助成形奖励函数,强化学习(RL)最近已成功应用于多个机器人控制任务。然而,设计一个与任务相关且性能良好的奖励函数需要花费时间和精力。尽管如此,如果强化学习能够训练智能体在稀疏奖励环境中完成任务,这是解决奖励函数设计难题的有效方法,但仍然是一项重大挑战。为了解决这个问题,开创性的indsight经验回放(HER)方法通过将不成功的经验转化为有用的训练样本,显著提高了在稀疏奖励环境中获取技能的概率。然而,HER仍然需要较长的训练周期。在本文中,我们提出了一种基于HER的新技术,称为带有目标修正好奇心模块的自适应HER(AHEGC),以进一步提高样本和探索效率。具体而言,开发了一种indsight经验(HE)采样率和奖励权重的自适应调整策略,以提高样本效率。此外,我们引入了一种好奇心机制,以鼓励对环境进行更高效的探索,并提出了一个目标修正(GA)好奇心模块,作为解决因引入好奇心而导致过度追求新奇性问题的解决方案。我们在包括Fetch和Hand环境在内的六个具有二元奖励的苛刻机器人控制任务上进行了实验。结果表明,所提出的方法在学习能力和收敛速度方面优于现有方法。

相似文献

1
AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control.AHEGC:用于机器人控制的带目标修正好奇心模块的自适应后见经验回放
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16602-16615. doi: 10.1109/TNNLS.2023.3296765. Epub 2024 Oct 29.
2
Sampling Rate Decay in Hindsight Experience Replay for Robot Control.事后经验回放中机器人控制的采样率衰减。
IEEE Trans Cybern. 2022 Mar;52(3):1515-1526. doi: 10.1109/TCYB.2020.2990722. Epub 2022 Mar 11.
3
Addressing Hindsight Bias in Multigoal Reinforcement Learning.解决多目标强化学习中的事后诸葛亮偏差
IEEE Trans Cybern. 2023 Jan;53(1):392-405. doi: 10.1109/TCYB.2021.3107202. Epub 2022 Dec 23.
4
Autonomous Driving of Mobile Robots in Dynamic Environments Based on Deep Deterministic Policy Gradient: Reward Shaping and Hindsight Experience Replay.基于深度确定性策略梯度的动态环境中移动机器人自主驾驶:奖励塑造与事后经验回放
Biomimetics (Basel). 2024 Jan 13;9(1):0. doi: 10.3390/biomimetics9010051.
5
Biped Robots Control in Gusty Environments with Adaptive Exploration Based DDPG.基于自适应探索深度确定性策略梯度算法的阵风环境下双足机器人控制
Biomimetics (Basel). 2024 Jun 8;9(6):346. doi: 10.3390/biomimetics9060346.
6
Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration.通过保守的好奇心驱动探索学习具有多个语义目标的机器人操作技能。
Front Neurorobot. 2023 Mar 7;17:1089270. doi: 10.3389/fnbot.2023.1089270. eCollection 2023.
7
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.
8
Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation.通过基于图的事后诸葛亮目标生成实现复杂机器人操作
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7863-7876. doi: 10.1109/TNNLS.2021.3088947. Epub 2022 Nov 30.
9
Robotic Manipulation in Dynamic Scenarios via Bounding-Box-Based Hindsight Goal Generation.通过基于边界框的事后目标生成实现动态场景中的机器人操作。
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5037-5050. doi: 10.1109/TNNLS.2021.3124366. Epub 2023 Aug 4.
10
Enhancing Stability and Performance in Mobile Robot Path Planning with PMR-Dueling DQN Algorithm.基于PMR-决斗深度Q网络算法提升移动机器人路径规划的稳定性与性能
Sensors (Basel). 2024 Feb 27;24(5):1523. doi: 10.3390/s24051523.