• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过进化策略梯度将机器人从示范学习推广到不同场景。

Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient.

作者信息

Cao Junjie, Liu Weiwei, Liu Yong, Yang Jian

机构信息

Institute of Cyber Systems and Control, Zhejiang University, Hangzhou, China.

China Research and Development Academy of Machinery Equipment, Beijing, China.

出版信息

Front Neurorobot. 2020 Apr 21;14:21. doi: 10.3389/fnbot.2020.00021. eCollection 2020.

DOI:10.3389/fnbot.2020.00021
PMID:32372940
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7188386/
Abstract

There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.

摘要

机器人自动化研究取得了显著进展,其目标是使机器人能够直接与世界或人类进行交互。从人类示范中进行自动化的机器人学习是这种情况的核心。然而,示范的依赖性将机器人限制在固定场景中,使其无法在不同情况下进行探索以完成与示范相同的任务。深度强化学习方法可能是使机器人超越人类示范学习并在未知情况下完成任务的好方法。探索是这种对不同环境进行泛化的核心。然而,强化学习中的探索可能效率低下,并存在样本效率低的问题。在本文中,我们提出了进化策略梯度(EPG),以使机器人从示范中学习并有效地进行目标导向的探索。通过目标导向的探索,我们的方法可以将机器人学到的技能泛化到具有不同参数的环境中。我们的进化策略梯度在进化算法(EAs)框架中将参数扰动与策略梯度方法相结合,可以融合两者的优点,实现有效且高效的探索。在示范指导进化过程的情况下,机器人可以加速目标导向的探索,以将其能力泛化到不同场景。在OpenAI Gym中使用密集和稀疏奖励进行的机器人控制任务实验表明,我们的EPG能够提供优于原始策略梯度方法和进化算法的竞争性能。在操纵器任务中,我们的机器人可以在与提供示范的环境不同的环境中学习通过视觉开门。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/9c20c6cc6a25/fnbot-14-00021-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/16f453609af1/fnbot-14-00021-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/75bbec64e5e0/fnbot-14-00021-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/d90e2507e014/fnbot-14-00021-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/4c92d1aec56e/fnbot-14-00021-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/b11fe02bc27c/fnbot-14-00021-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/3e888f3e2acb/fnbot-14-00021-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/0a6189051d34/fnbot-14-00021-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/9c20c6cc6a25/fnbot-14-00021-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/16f453609af1/fnbot-14-00021-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/75bbec64e5e0/fnbot-14-00021-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/d90e2507e014/fnbot-14-00021-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/4c92d1aec56e/fnbot-14-00021-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/b11fe02bc27c/fnbot-14-00021-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/3e888f3e2acb/fnbot-14-00021-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/0a6189051d34/fnbot-14-00021-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4d/7188386/9c20c6cc6a25/fnbot-14-00021-g0008.jpg

相似文献

1
Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient.通过进化策略梯度将机器人从示范学习推广到不同场景。
Front Neurorobot. 2020 Apr 21;14:21. doi: 10.3389/fnbot.2020.00021. eCollection 2020.
2
Human-robot skills transfer interfaces for a flexible surgical robot.用于灵活手术机器人的人机技能转移接口。
Comput Methods Programs Biomed. 2014 Sep;116(2):81-96. doi: 10.1016/j.cmpb.2013.12.015. Epub 2014 Jan 8.
3
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。
Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.
4
Robot Motor Skill Transfer With Alternate Learning in Two Spaces.基于两个空间交替学习的机器人运动技能迁移
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4553-4564. doi: 10.1109/TNNLS.2020.3021530. Epub 2021 Oct 5.
5
Learning With Stochastic Guidance for Robot Navigation.基于随机引导的机器人导航学习
IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):166-176. doi: 10.1109/TNNLS.2020.2977924. Epub 2021 Jan 4.
6
Robot grasping method optimization using improved deep deterministic policy gradient algorithm of deep reinforcement learning.基于深度强化学习的改进深度确定性策略梯度算法的机器人抓取方法优化
Rev Sci Instrum. 2021 Feb 1;92(2):025114. doi: 10.1063/5.0034101.
7
A Task-Learning Strategy for Robotic Assembly Tasks from Human Demonstrations.一种基于人类示教的机器人装配任务学习策略。
Sensors (Basel). 2020 Sep 25;20(19):5505. doi: 10.3390/s20195505.
8
A Multitasking-Oriented Robot Arm Motion Planning Scheme Based on Deep Reinforcement Learning and Twin Synchro-Control.基于深度强化学习和双同步控制的面向多任务的机械臂运动规划方案。
Sensors (Basel). 2020 Jun 21;20(12):3515. doi: 10.3390/s20123515.
9
Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment.基于深度强化学习的未知环境导航自动探索。
IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):2064-2076. doi: 10.1109/TNNLS.2019.2927869. Epub 2019 Aug 6.
10
An End-to-End Deep Reinforcement Learning-Based Intelligent Agent Capable of Autonomous Exploration in Unknown Environments.一种基于端到端深度学习的智能代理,能够在未知环境中自主探索。
Sensors (Basel). 2018 Oct 22;18(10):3575. doi: 10.3390/s18103575.

引用本文的文献

1
Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning.机器人学习:深度强化学习、模仿学习、迁移学习。
Sensors (Basel). 2021 Feb 11;21(4):1278. doi: 10.3390/s21041278.

本文引用的文献

1
Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.自适应基线增强基于期望最大化的策略搜索:在智能手机平衡器的基于视图的定位任务中的验证
Front Neurorobot. 2017 Jan 23;11:1. doi: 10.3389/fnbot.2017.00001. eCollection 2017.
2
Parameter-exploring policy gradients.参数探索策略梯度。
Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.
3
On learning, representing, and generalizing a task in a humanoid robot.关于人形机器人中任务的学习、表示与泛化
IEEE Trans Syst Man Cybern B Cybern. 2007 Apr;37(2):286-98. doi: 10.1109/tsmcb.2006.886952.