• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

行为策略学习:学习多阶段任务解决方案草图和基于模型的控制器。

Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.

作者信息

Tsinganos Konstantinos, Chatzilygeroudis Konstantinos, Hadjivelichkov Denis, Komninos Theodoros, Dermatas Evangelos, Kanoulas Dimitrios

机构信息

Department of Computer Engineering and Informatics (CEID), University of Patras, Patras, Greece.

Computer Technology Institute and Press "Diophantus" (CTI), Patras, Greece.

出版信息

Front Robot AI. 2022 Oct 12;9:974537. doi: 10.3389/frobt.2022.974537. eCollection 2022.

DOI:10.3389/frobt.2022.974537
PMID:36313244
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9597635/
Abstract

Multi-stage tasks are a challenge for reinforcement learning methods, and require either specific task knowledge (e.g., task segmentation) or big amount of interaction times to be learned. In this paper, we propose Behavior Policy Learning (BPL) that effectively combines 1) only few solution sketches, that is demonstrations without the actions, but only the states, 2) model-based controllers, and 3) simulations to effectively solve multi-stage tasks without strong knowledge about the underlying task. Our main intuition is that solution sketches alone can provide strong data for learning a high-level trajectory by imitation, and model-based controllers can be used to follow this trajectory (we call it behavior) effectively. Finally, we utilize robotic simulations to further improve the policy and make it robust in a Sim2Real style. We evaluate our method in simulation with a robotic manipulator that has to perform two tasks with variations: 1) grasp a box and place it in a basket, and 2) re-place a book on a different level within a bookcase. We also validate the Sim2Real capabilities of our method by performing real-world experiments and realistic simulated experiments where the objects are tracked through an RGB-D camera for the first task.

摘要

多阶段任务对强化学习方法来说是一项挑战,需要特定的任务知识(例如任务分割)或大量的交互次数才能学习。在本文中,我们提出了行为策略学习(BPL),它有效地结合了:1)仅少量的解决方案草图,即没有动作但只有状态的演示;2)基于模型的控制器;3)模拟,以在没有关于底层任务的强大知识的情况下有效地解决多阶段任务。我们的主要直觉是,仅解决方案草图就能为通过模仿学习高级轨迹提供强大的数据,并且基于模型的控制器可用于有效地遵循此轨迹(我们称之为行为)。最后,我们利用机器人模拟来进一步改进策略,并使其在模拟到现实(Sim2Real)的风格中具有鲁棒性。我们在模拟中使用一个机器人操纵器评估我们的方法,该操纵器必须执行两个有变化的任务:1)抓取一个盒子并将其放入篮子中;2)将一本书重新放置在书架内的不同层上。我们还通过进行真实世界实验和逼真的模拟实验来验证我们方法的模拟到现实能力,在第一个任务中通过RGB-D相机跟踪物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/0d4dd00bee38/frobt-09-974537-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d6a872e99855/frobt-09-974537-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/19f027050394/frobt-09-974537-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/9b26ec63fabf/frobt-09-974537-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/dc47e981bf4f/frobt-09-974537-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/720e36048e6b/frobt-09-974537-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/6640b9a36135/frobt-09-974537-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/a4874ea9c72e/frobt-09-974537-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d11b85f18a5b/frobt-09-974537-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/4fc226f49faa/frobt-09-974537-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/0d4dd00bee38/frobt-09-974537-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d6a872e99855/frobt-09-974537-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/19f027050394/frobt-09-974537-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/9b26ec63fabf/frobt-09-974537-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/dc47e981bf4f/frobt-09-974537-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/720e36048e6b/frobt-09-974537-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/6640b9a36135/frobt-09-974537-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/a4874ea9c72e/frobt-09-974537-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d11b85f18a5b/frobt-09-974537-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/4fc226f49faa/frobt-09-974537-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/0d4dd00bee38/frobt-09-974537-g010.jpg

相似文献

1
Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.行为策略学习:学习多阶段任务解决方案草图和基于模型的控制器。
Front Robot AI. 2022 Oct 12;9:974537. doi: 10.3389/frobt.2022.974537. eCollection 2022.
2
Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers.基于基础控制器学习长视野稀疏奖励机器人操纵任务
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4072-4081. doi: 10.1109/TNNLS.2022.3201705. Epub 2024 Feb 29.
3
Hybrid Imitation Learning Framework for Robotic Manipulation Tasks.机器人操作任务的混合模仿学习框架。
Sensors (Basel). 2021 May 13;21(10):3409. doi: 10.3390/s21103409.
4
Bayesian Disturbance Injection: Robust imitation learning of flexible policies for robot manipulation.贝叶斯干扰注入:用于机器人操作的灵活策略的稳健模仿学习。
Neural Netw. 2023 Jan;158:42-58. doi: 10.1016/j.neunet.2022.11.008. Epub 2022 Nov 11.
5
An Adaptive Imitation Learning Framework for Robotic Complex Contact-Rich Insertion Tasks.用于机器人复杂的富含接触的插入任务的自适应模仿学习框架
Front Robot AI. 2022 Jan 11;8:777363. doi: 10.3389/frobt.2021.777363. eCollection 2021.
6
Deep imitation learning for 3D navigation tasks.用于3D导航任务的深度模仿学习
Neural Comput Appl. 2018;29(7):389-404. doi: 10.1007/s00521-017-3241-z. Epub 2017 Dec 4.
7
BAGAIL: Multi-modal imitation learning from imbalanced demonstrations.贝加尔:基于不平衡演示的多模态模仿学习。
Neural Netw. 2024 Jun;174:106251. doi: 10.1016/j.neunet.2024.106251. Epub 2024 Mar 19.
8
Hierarchical Tactile-Based Control Decomposition of Dexterous In-Hand Manipulation Tasks.基于分层触觉的灵巧手中操作任务控制分解
Front Robot AI. 2020 Nov 19;7:521448. doi: 10.3389/frobt.2020.521448. eCollection 2020.
9
Extended residual learning with one-shot imitation learning for robotic assembly in semi-structured environment.半结构化环境下用于机器人装配的基于一次性模仿学习的扩展残差学习
Front Neurorobot. 2024 Apr 29;18:1355170. doi: 10.3389/fnbot.2024.1355170. eCollection 2024.
10
Imitation and mirror systems in robots through Deep Modality Blending Networks.机器人中的模仿和镜像系统通过深度模态混合网络。
Neural Netw. 2022 Feb;146:22-35. doi: 10.1016/j.neunet.2021.11.004. Epub 2021 Nov 16.

本文引用的文献

1
Learning agile and dynamic motor skills for legged robots.学习用于腿部机器人的敏捷和动态运动技能。
Sci Robot. 2019 Jan 16;4(26). doi: 10.1126/scirobotics.aau5872.
2
Learning quadrupedal locomotion over challenging terrain.学习在具有挑战性的地形上进行四足运动。
Sci Robot. 2020 Oct 21;5(47). doi: 10.1126/scirobotics.abc5986.
3
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
4
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
5
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.