Kaushik Rituraj, Desreumaux Pierre, Mouret Jean-Baptiste
Inria, CNRS, Université de Lorraine, Nancy, France.
Front Robot AI. 2020 Jan 20;6:151. doi: 10.3389/frobt.2019.00151. eCollection 2019.
Repertoire-based learning is a data-efficient adaptation approach based on a two-step process in which (1) a large and diverse set of policies is learned in simulation, and (2) a planning or learning algorithm chooses the most appropriate policies according to the current situation (e.g., a damaged robot, a new object, etc.). In this paper, we relax the assumption of previous works that a single repertoire is enough for adaptation. Instead, we generate repertoires for many different situations (e.g., with a missing leg, on different floors, etc.) and let our algorithm selects the most useful prior. Our main contribution is an algorithm, APROL (Adaptive Prior selection for Repertoire-based Online Learning) to plan the next action by incorporating these priors when the robot has no information about the current situation. We evaluate APROL on two simulated tasks: (1) pushing unknown objects of various shapes and sizes with a robotic arm and (2) a goal reaching task with a damaged hexapod robot. We compare with "Reset-free Trial and Error" (RTE) and various single repertoire-based baselines. The results show that APROL solves both the tasks in less interaction time than the baselines. Additionally, we demonstrate APROL on a real, damaged hexapod that quickly learns to pick compensatory policies to reach a goal by avoiding obstacles in the path.
基于策略库的学习是一种数据高效的自适应方法,它基于一个两步过程:(1)在模拟中学习大量多样的策略集;(2)规划或学习算法根据当前情况(例如,机器人损坏、出现新物体等)选择最合适的策略。在本文中,我们放宽了先前工作中单一策略库足以实现自适应的假设。相反,我们针对许多不同情况(例如,缺一条腿、在不同楼层等)生成策略库,并让我们的算法选择最有用的先验知识。我们的主要贡献是一种算法APROL(基于策略库的在线学习的自适应先验选择),当机器人对当前情况一无所知时,通过合并这些先验知识来规划下一个动作。我们在两个模拟任务上评估APROL:(1)用机器人手臂推动各种形状和大小的未知物体;(2)一个受损六足机器人的目标达成任务。我们与“无重置试错法”(RTE)和各种基于单一策略库的基线方法进行比较。结果表明,APROL在比基线方法更少的交互时间内解决了这两个任务。此外,我们在一个真实的、受损的六足机器人上演示了APROL,它通过避开路径中的障碍物快速学习选择补偿策略以达成目标。