Suppr超能文献

行为策略学习:学习多阶段任务解决方案草图和基于模型的控制器。

Behavior policy learning: Learning multi-stage tasks solution sketches and model-based controllers.

作者信息

Tsinganos Konstantinos, Chatzilygeroudis Konstantinos, Hadjivelichkov Denis, Komninos Theodoros, Dermatas Evangelos, Kanoulas Dimitrios

机构信息

Department of Computer Engineering and Informatics (CEID), University of Patras, Patras, Greece.

Computer Technology Institute and Press "Diophantus" (CTI), Patras, Greece.

出版信息

Front Robot AI. 2022 Oct 12;9:974537. doi: 10.3389/frobt.2022.974537. eCollection 2022.

Abstract

Multi-stage tasks are a challenge for reinforcement learning methods, and require either specific task knowledge (e.g., task segmentation) or big amount of interaction times to be learned. In this paper, we propose Behavior Policy Learning (BPL) that effectively combines 1) only few solution sketches, that is demonstrations without the actions, but only the states, 2) model-based controllers, and 3) simulations to effectively solve multi-stage tasks without strong knowledge about the underlying task. Our main intuition is that solution sketches alone can provide strong data for learning a high-level trajectory by imitation, and model-based controllers can be used to follow this trajectory (we call it behavior) effectively. Finally, we utilize robotic simulations to further improve the policy and make it robust in a Sim2Real style. We evaluate our method in simulation with a robotic manipulator that has to perform two tasks with variations: 1) grasp a box and place it in a basket, and 2) re-place a book on a different level within a bookcase. We also validate the Sim2Real capabilities of our method by performing real-world experiments and realistic simulated experiments where the objects are tracked through an RGB-D camera for the first task.

摘要

多阶段任务对强化学习方法来说是一项挑战,需要特定的任务知识(例如任务分割)或大量的交互次数才能学习。在本文中,我们提出了行为策略学习(BPL),它有效地结合了:1)仅少量的解决方案草图,即没有动作但只有状态的演示;2)基于模型的控制器;3)模拟,以在没有关于底层任务的强大知识的情况下有效地解决多阶段任务。我们的主要直觉是,仅解决方案草图就能为通过模仿学习高级轨迹提供强大的数据,并且基于模型的控制器可用于有效地遵循此轨迹(我们称之为行为)。最后,我们利用机器人模拟来进一步改进策略,并使其在模拟到现实(Sim2Real)的风格中具有鲁棒性。我们在模拟中使用一个机器人操纵器评估我们的方法,该操纵器必须执行两个有变化的任务:1)抓取一个盒子并将其放入篮子中;2)将一本书重新放置在书架内的不同层上。我们还通过进行真实世界实验和逼真的模拟实验来验证我们方法的模拟到现实能力,在第一个任务中通过RGB-D相机跟踪物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ba/9597635/d6a872e99855/frobt-09-974537-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验