行为策略学习：学习多阶段任务解决方案草图和基于模型的控制器。

Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.

作者信息

Tsinganos Konstantinos, Chatzilygeroudis Konstantinos, Hadjivelichkov Denis, Komninos Theodoros, Dermatas Evangelos, Kanoulas Dimitrios

机构信息

Department of Computer Engineering and Informatics (CEID), University of Patras, Patras, Greece.

Computer Technology Institute and Press "Diophantus" (CTI), Patras, Greece.

出版信息

Front Robot AI. 2022 Oct 12;9:974537. doi: 10.3389/frobt.2022.974537. eCollection 2022.

DOI:10.3389/frobt.2022.974537

PMID:36313244

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9597635/

Abstract

Multi-stage tasks are a challenge for reinforcement learning methods, and require either specific task knowledge (e.g., task segmentation) or big amount of interaction times to be learned. In this paper, we propose Behavior Policy Learning (BPL) that effectively combines 1) only few solution sketches, that is demonstrations without the actions, but only the states, 2) model-based controllers, and 3) simulations to effectively solve multi-stage tasks without strong knowledge about the underlying task. Our main intuition is that solution sketches alone can provide strong data for learning a high-level trajectory by imitation, and model-based controllers can be used to follow this trajectory (we call it behavior) effectively. Finally, we utilize robotic simulations to further improve the policy and make it robust in a Sim2Real style. We evaluate our method in simulation with a robotic manipulator that has to perform two tasks with variations: 1) grasp a box and place it in a basket, and 2) re-place a book on a different level within a bookcase. We also validate the Sim2Real capabilities of our method by performing real-world experiments and realistic simulated experiments where the objects are tracked through an RGB-D camera for the first task.

摘要

多阶段任务对强化学习方法来说是一项挑战，需要特定的任务知识（例如任务分割）或大量的交互次数才能学习。在本文中，我们提出了行为策略学习（BPL），它有效地结合了：1）仅少量的解决方案草图，即没有动作但只有状态的演示；2）基于模型的控制器；3）模拟，以在没有关于底层任务的强大知识的情况下有效地解决多阶段任务。我们的主要直觉是，仅解决方案草图就能为通过模仿学习高级轨迹提供强大的数据，并且基于模型的控制器可用于有效地遵循此轨迹（我们称之为行为）。最后，我们利用机器人模拟来进一步改进策略，并使其在模拟到现实（Sim2Real）的风格中具有鲁棒性。我们在模拟中使用一个机器人操纵器评估我们的方法，该操纵器必须执行两个有变化的任务：1）抓取一个盒子并将其放入篮子中；2）将一本书重新放置在书架内的不同层上。我们还通过进行真实世界实验和逼真的模拟实验来验证我们方法的模拟到现实能力，在第一个任务中通过RGB-D相机跟踪物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d6a872e99855/frobt-09-974537-g001.jpg

相似文献

Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.行为策略学习：学习多阶段任务解决方案草图和基于模型的控制器。

Front Robot AI. 2022 Oct 12;9:974537. doi: 10.3389/frobt.2022.974537. eCollection 2022.

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers.基于基础控制器学习长视野稀疏奖励机器人操纵任务

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4072-4081. doi: 10.1109/TNNLS.2022.3201705. Epub 2024 Feb 29.

Hybrid Imitation Learning Framework for Robotic Manipulation Tasks.机器人操作任务的混合模仿学习框架。

Sensors (Basel). 2021 May 13;21(10):3409. doi: 10.3390/s21103409.

Bayesian Disturbance Injection: Robust imitation learning of flexible policies for robot manipulation.贝叶斯干扰注入：用于机器人操作的灵活策略的稳健模仿学习。

Neural Netw. 2023 Jan;158:42-58. doi: 10.1016/j.neunet.2022.11.008. Epub 2022 Nov 11.

An Adaptive Imitation Learning Framework for Robotic Complex Contact-Rich Insertion Tasks.用于机器人复杂的富含接触的插入任务的自适应模仿学习框架

Front Robot AI. 2022 Jan 11;8:777363. doi: 10.3389/frobt.2021.777363. eCollection 2021.

Deep imitation learning for 3D navigation tasks.用于3D导航任务的深度模仿学习

Neural Comput Appl. 2018;29(7):389-404. doi: 10.1007/s00521-017-3241-z. Epub 2017 Dec 4.

BAGAIL: Multi-modal imitation learning from imbalanced demonstrations.贝加尔：基于不平衡演示的多模态模仿学习。

Neural Netw. 2024 Jun;174:106251. doi: 10.1016/j.neunet.2024.106251. Epub 2024 Mar 19.

Hierarchical Tactile-Based Control Decomposition of Dexterous In-Hand Manipulation Tasks.基于分层触觉的灵巧手中操作任务控制分解

Front Robot AI. 2020 Nov 19;7:521448. doi: 10.3389/frobt.2020.521448. eCollection 2020.

Extended residual learning with one-shot imitation learning for robotic assembly in semi-structured environment.半结构化环境下用于机器人装配的基于一次性模仿学习的扩展残差学习

Front Neurorobot. 2024 Apr 29;18:1355170. doi: 10.3389/fnbot.2024.1355170. eCollection 2024.

Imitation and mirror systems in robots through Deep Modality Blending Networks.机器人中的模仿和镜像系统通过深度模态混合网络。

Neural Netw. 2022 Feb;146:22-35. doi: 10.1016/j.neunet.2021.11.004. Epub 2021 Nov 16.

本文引用的文献

Learning agile and dynamic motor skills for legged robots.学习用于腿部机器人的敏捷和动态运动技能。

Sci Robot. 2019 Jan 16;4(26). doi: 10.1126/scirobotics.aau5872.

Learning quadrupedal locomotion over challenging terrain.学习在具有挑战性的地形上进行四足运动。

Sci Robot. 2020 Oct 21;5(47). doi: 10.1126/scirobotics.abc5986.

Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。

Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

行为策略学习：学习多阶段任务解决方案草图和基于模型的控制器。

Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献