• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过保守的好奇心驱动探索学习具有多个语义目标的机器人操作技能。

Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration.

作者信息

Han Changlin, Peng Zhiyong, Liu Yadong, Tang Jingsheng, Yu Yang, Zhou Zongtan

机构信息

Department of Intelligence Science and Technology, College of Intelligence Science, National University of Defense Technology, Changsha, China.

出版信息

Front Neurorobot. 2023 Mar 7;17:1089270. doi: 10.3389/fnbot.2023.1089270. eCollection 2023.

DOI:10.3389/fnbot.2023.1089270
PMID:36960195
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10028088/
Abstract

Reinforcement learning (RL) empowers the agent to learn robotic manipulation skills autonomously. Compared with traditional single-goal RL, semantic-goal-conditioned RL expands the agent capacity to accomplish multiple semantic manipulation instructions. However, due to sparsely distributed semantic goals and sparse-reward agent-environment interactions, the hard exploration problem arises and impedes the agent training process. In traditional RL, curiosity-motivated exploration shows effectiveness in solving the hard exploration problem. However, in semantic-goal-conditioned RL, the performance of previous curiosity-motivated methods deteriorates, which we propose is because of their two defects: uncontrollability and distraction. To solve these defects, we propose a conservative curiosity-motivated method named mutual information motivation with hybrid policy mechanism (MIHM). MIHM mainly contributes two innovations: the decoupled-mutual-information-based intrinsic motivation, which prevents the agent from being motivated to explore dangerous states by uncontrollable curiosity; the precisely trained and automatically switched hybrid policy mechanism, which eliminates the distraction from the curiosity-motivated policy and achieves the optimal utilization of exploration and exploitation. Compared with four state-of-the-art curiosity-motivated methods in the sparse-reward robotic manipulation task with 35 valid semantic goals, including stacks of 2 or 3 objects and pyramids, our MIHM shows the fastest learning speed. Moreover, MIHM achieves the highest 0.9 total success rate, which is up to 0.6 in other methods. Throughout all the baseline methods, our MIHM is the only one that achieves to stack three objects.

摘要

强化学习(RL)使智能体能够自主学习机器人操作技能。与传统的单目标强化学习相比,语义目标条件强化学习扩展了智能体完成多个语义操作指令的能力。然而,由于语义目标分布稀疏以及智能体与环境的交互奖励稀疏,出现了困难探索问题,阻碍了智能体的训练过程。在传统强化学习中,基于好奇心的探索在解决困难探索问题方面显示出有效性。然而,在语义目标条件强化学习中,先前基于好奇心的方法性能下降,我们认为这是由于它们的两个缺陷:不可控性和干扰性。为了解决这些缺陷,我们提出了一种保守的基于好奇心的方法,即具有混合策略机制的互信息激励(MIHM)。MIHM主要有两项创新:基于解耦互信息的内在激励,可防止智能体因不可控的好奇心而去探索危险状态;经过精确训练并能自动切换的混合策略机制,可消除基于好奇心的策略带来的干扰,并实现探索与利用的最优利用。在具有35个有效语义目标(包括2个或3个物体的堆叠以及金字塔)的稀疏奖励机器人操作任务中,与四种最先进的基于好奇心的方法相比,我们的MIHM显示出最快的学习速度。此外,MIHM实现了最高0.9的总成功率,而其他方法最高为0.6。在所有基线方法中,我们的MIHM是唯一能够堆叠三个物体的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/a00348e47fa5/fnbot-17-1089270-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/669812c4c212/fnbot-17-1089270-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/15fa2dd1a067/fnbot-17-1089270-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/8c1094a49342/fnbot-17-1089270-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/d06615e1e92b/fnbot-17-1089270-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/99684490b64b/fnbot-17-1089270-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/9d504c78651f/fnbot-17-1089270-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/a00348e47fa5/fnbot-17-1089270-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/669812c4c212/fnbot-17-1089270-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/15fa2dd1a067/fnbot-17-1089270-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/8c1094a49342/fnbot-17-1089270-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/d06615e1e92b/fnbot-17-1089270-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/99684490b64b/fnbot-17-1089270-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/9d504c78651f/fnbot-17-1089270-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8baf/10028088/a00348e47fa5/fnbot-17-1089270-g0007.jpg

相似文献

1
Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration.通过保守的好奇心驱动探索学习具有多个语义目标的机器人操作技能。
Front Neurorobot. 2023 Mar 7;17:1089270. doi: 10.3389/fnbot.2023.1089270. eCollection 2023.
2
AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control.AHEGC:用于机器人控制的带目标修正好奇心模块的自适应后见经验回放
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16602-16615. doi: 10.1109/TNNLS.2023.3296765. Epub 2024 Oct 29.
3
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
4
Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control.面向任务的机器人技能获取和控制的深度强化学习。
IEEE Trans Cybern. 2021 Feb;51(2):1056-1069. doi: 10.1109/TCYB.2019.2949596. Epub 2021 Jan 15.
5
Learning intraoperative organ manipulation with context-based reinforcement learning.基于上下文的强化学习来学习术中器官操作。
Int J Comput Assist Radiol Surg. 2022 Aug;17(8):1419-1427. doi: 10.1007/s11548-022-02630-2. Epub 2022 May 3.
6
Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle.探索的内在奖励,无需因观测噪声而产生危害:基于自由能原理的模拟研究。
Neural Comput. 2024 Aug 19;36(9):1854-1885. doi: 10.1162/neco_a_01690.
7
Learning tactile skills through curious exploration.通过好奇探索学习触觉技能。
Front Neurorobot. 2012 Jul 23;6:6. doi: 10.3389/fnbot.2012.00006. eCollection 2012.
8
Exploration With Task Information for Meta Reinforcement Learning.基于任务信息的元强化学习探索
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4033-4046. doi: 10.1109/TNNLS.2021.3121432. Epub 2023 Aug 4.
9
End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation.端到端自主探索的深度强化学习和内在动机。
Comput Intell Neurosci. 2021 Dec 16;2021:9945044. doi: 10.1155/2021/9945044. eCollection 2021.
10
Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates.由不确定性降低和条件性强化引导的非人灵长类动物内在动机性眼球运动探索。
Sci Rep. 2016 Feb 3;6:20202. doi: 10.1038/srep20202.

本文引用的文献

1
Learning agile and dynamic motor skills for legged robots.学习用于腿部机器人的敏捷和动态运动技能。
Sci Robot. 2019 Jan 16;4(26). doi: 10.1126/scirobotics.aau5872.
2
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.