Suppr超能文献

机器人技术中基于指令库的在线自适应的自适应先验选择

Adaptive Prior Selection for Repertoire-Based Online Adaptation in Robotics.

作者信息

Kaushik Rituraj, Desreumaux Pierre, Mouret Jean-Baptiste

机构信息

Inria, CNRS, Université de Lorraine, Nancy, France.

出版信息

Front Robot AI. 2020 Jan 20;6:151. doi: 10.3389/frobt.2019.00151. eCollection 2019.

Abstract

Repertoire-based learning is a data-efficient adaptation approach based on a two-step process in which (1) a large and diverse set of policies is learned in simulation, and (2) a planning or learning algorithm chooses the most appropriate policies according to the current situation (e.g., a damaged robot, a new object, etc.). In this paper, we relax the assumption of previous works that a single repertoire is enough for adaptation. Instead, we generate repertoires for many different situations (e.g., with a missing leg, on different floors, etc.) and let our algorithm selects the most useful prior. Our main contribution is an algorithm, APROL (Adaptive Prior selection for Repertoire-based Online Learning) to plan the next action by incorporating these priors when the robot has no information about the current situation. We evaluate APROL on two simulated tasks: (1) pushing unknown objects of various shapes and sizes with a robotic arm and (2) a goal reaching task with a damaged hexapod robot. We compare with "Reset-free Trial and Error" (RTE) and various single repertoire-based baselines. The results show that APROL solves both the tasks in less interaction time than the baselines. Additionally, we demonstrate APROL on a real, damaged hexapod that quickly learns to pick compensatory policies to reach a goal by avoiding obstacles in the path.

摘要

基于策略库的学习是一种数据高效的自适应方法,它基于一个两步过程:(1)在模拟中学习大量多样的策略集;(2)规划或学习算法根据当前情况(例如,机器人损坏、出现新物体等)选择最合适的策略。在本文中,我们放宽了先前工作中单一策略库足以实现自适应的假设。相反,我们针对许多不同情况(例如,缺一条腿、在不同楼层等)生成策略库,并让我们的算法选择最有用的先验知识。我们的主要贡献是一种算法APROL(基于策略库的在线学习的自适应先验选择),当机器人对当前情况一无所知时,通过合并这些先验知识来规划下一个动作。我们在两个模拟任务上评估APROL:(1)用机器人手臂推动各种形状和大小的未知物体;(2)一个受损六足机器人的目标达成任务。我们与“无重置试错法”(RTE)和各种基于单一策略库的基线方法进行比较。结果表明,APROL在比基线方法更少的交互时间内解决了这两个任务。此外,我们在一个真实的、受损的六足机器人上演示了APROL,它通过避开路径中的障碍物快速学习选择补偿策略以达成目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7e1f86858a57/frobt-06-00151-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验