• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器人技术中基于指令库的在线自适应的自适应先验选择

Adaptive Prior Selection for Repertoire-Based Online Adaptation in Robotics.

作者信息

Kaushik Rituraj, Desreumaux Pierre, Mouret Jean-Baptiste

机构信息

Inria, CNRS, Université de Lorraine, Nancy, France.

出版信息

Front Robot AI. 2020 Jan 20;6:151. doi: 10.3389/frobt.2019.00151. eCollection 2019.

DOI:10.3389/frobt.2019.00151
PMID:33501166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7805922/
Abstract

Repertoire-based learning is a data-efficient adaptation approach based on a two-step process in which (1) a large and diverse set of policies is learned in simulation, and (2) a planning or learning algorithm chooses the most appropriate policies according to the current situation (e.g., a damaged robot, a new object, etc.). In this paper, we relax the assumption of previous works that a single repertoire is enough for adaptation. Instead, we generate repertoires for many different situations (e.g., with a missing leg, on different floors, etc.) and let our algorithm selects the most useful prior. Our main contribution is an algorithm, APROL (Adaptive Prior selection for Repertoire-based Online Learning) to plan the next action by incorporating these priors when the robot has no information about the current situation. We evaluate APROL on two simulated tasks: (1) pushing unknown objects of various shapes and sizes with a robotic arm and (2) a goal reaching task with a damaged hexapod robot. We compare with "Reset-free Trial and Error" (RTE) and various single repertoire-based baselines. The results show that APROL solves both the tasks in less interaction time than the baselines. Additionally, we demonstrate APROL on a real, damaged hexapod that quickly learns to pick compensatory policies to reach a goal by avoiding obstacles in the path.

摘要

基于策略库的学习是一种数据高效的自适应方法,它基于一个两步过程:(1)在模拟中学习大量多样的策略集;(2)规划或学习算法根据当前情况(例如,机器人损坏、出现新物体等)选择最合适的策略。在本文中,我们放宽了先前工作中单一策略库足以实现自适应的假设。相反,我们针对许多不同情况(例如,缺一条腿、在不同楼层等)生成策略库,并让我们的算法选择最有用的先验知识。我们的主要贡献是一种算法APROL(基于策略库的在线学习的自适应先验选择),当机器人对当前情况一无所知时,通过合并这些先验知识来规划下一个动作。我们在两个模拟任务上评估APROL:(1)用机器人手臂推动各种形状和大小的未知物体;(2)一个受损六足机器人的目标达成任务。我们与“无重置试错法”(RTE)和各种基于单一策略库的基线方法进行比较。结果表明,APROL在比基线方法更少的交互时间内解决了这两个任务。此外,我们在一个真实的、受损的六足机器人上演示了APROL,它通过避开路径中的障碍物快速学习选择补偿策略以达成目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7686557e65ad/frobt-06-00151-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7e1f86858a57/frobt-06-00151-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/5679321e7de8/frobt-06-00151-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/5cbb99fb49b1/frobt-06-00151-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/a032e489eb77/frobt-06-00151-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/6ff6249a4428/frobt-06-00151-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/26dc6df838aa/frobt-06-00151-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/a139da51b328/frobt-06-00151-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7686557e65ad/frobt-06-00151-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7e1f86858a57/frobt-06-00151-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/5679321e7de8/frobt-06-00151-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/5cbb99fb49b1/frobt-06-00151-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/a032e489eb77/frobt-06-00151-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/6ff6249a4428/frobt-06-00151-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/26dc6df838aa/frobt-06-00151-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/a139da51b328/frobt-06-00151-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e755/7805922/7686557e65ad/frobt-06-00151-g0008.jpg

相似文献

1
Adaptive Prior Selection for Repertoire-Based Online Adaptation in Robotics.机器人技术中基于指令库的在线自适应的自适应先验选择
Front Robot AI. 2020 Jan 20;6:151. doi: 10.3389/frobt.2019.00151. eCollection 2019.
2
Evolving a Behavioral Repertoire for a Walking Robot.为步行机器人进化行为组合。
Evol Comput. 2016 Spring;24(1):59-88. doi: 10.1162/EVCO_a_00143. Epub 2015 Jan 13.
3
SimPLE, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects.SimPLE是一种在模拟中学习的视觉触觉方法,用于精确抓取、定位、重新抓取和放置物体。
Sci Robot. 2024 Jun 26;9(91):eadi8808. doi: 10.1126/scirobotics.adi8808.
4
Leveraging Large-Scale Semantic Networks for Adaptive Robot Task Learning and Execution.利用大规模语义网络进行自适应机器人任务学习与执行。
Big Data. 2016 Dec;4(4):217-235. doi: 10.1089/big.2016.0038.
5
Evolutionary online behaviour learning and adaptation in real robots.真实机器人中的进化在线行为学习与适应
R Soc Open Sci. 2017 Jul 26;4(7):160938. doi: 10.1098/rsos.160938. eCollection 2017 Jul.
6
Real-time sensory-motor integration of hippocampal place cell replay and prefrontal sequence learning in simulated and physical rat robots for novel path optimization.模拟和物理大鼠机器人中海马位置细胞重放和前额叶序列学习的实时感觉运动整合,用于新路径优化。
Biol Cybern. 2020 Apr;114(2):249-268. doi: 10.1007/s00422-020-00820-2. Epub 2020 Feb 24.
7
A Bayesian Developmental Approach to Robotic Goal-Based Imitation Learning.一种基于贝叶斯发展方法的机器人目标导向模仿学习
PLoS One. 2015 Nov 4;10(11):e0141965. doi: 10.1371/journal.pone.0141965. eCollection 2015.
8
Adaptive and predictive control of a simulated robot arm.模拟机器人手臂的自适应和预测控制。
Int J Neural Syst. 2013 Jun;23(3):1350010. doi: 10.1142/S012906571350010X. Epub 2013 Mar 26.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Customizing skills for assistive robotic manipulators, an inverse reinforcement learning approach with error-related potentials.辅助机器人操纵器的技能定制,一种具有错误相关电位的逆强化学习方法。
Commun Biol. 2021 Dec 16;4(1):1406. doi: 10.1038/s42003-021-02891-8.

引用本文的文献

1
An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials.一种用于探索和优化纳米材料的人工智能化学合成机器人。
Sci Adv. 2022 Oct 7;8(40):eabo2626. doi: 10.1126/sciadv.abo2626.
2
Improvements in Medical System Safety Analytics for Authentic Measure of Vital Signs Using Fault-Tolerant Design Approach.采用容错设计方法对生命体征进行可靠测量的医疗系统安全分析的改进。
Front Med Technol. 2021 Aug 25;3:666671. doi: 10.3389/fmedt.2021.666671. eCollection 2021.
3
Expect the Unexpected: Leveraging the Human-Robot Ecosystem to Handle Unexpected Robot Failures.

本文引用的文献

1
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
2
Gaussian Processes for Data-Efficient Learning in Robotics and Control.机器人与控制中的数据高效学习的高斯过程
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):408-23. doi: 10.1109/TPAMI.2013.218.
3
Robots that can adapt like animals.能够像动物一样适应环境的机器人。
意料之外亦在预料之中:利用人机生态系统应对机器人意外故障
Front Robot AI. 2021 Jul 26;8:656385. doi: 10.3389/frobt.2021.656385. eCollection 2021.
4
Evolving the Behavior of Machines: From Micro to Macroevolution.机器行为的演变:从微观进化到宏观进化
iScience. 2020 Oct 28;23(11):101731. doi: 10.1016/j.isci.2020.101731. eCollection 2020 Nov 20.
Nature. 2015 May 28;521(7553):503-7. doi: 10.1038/nature14422.
4
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
5
Evolving a Behavioral Repertoire for a Walking Robot.为步行机器人进化行为组合。
Evol Comput. 2016 Spring;24(1):59-88. doi: 10.1162/EVCO_a_00143. Epub 2015 Jan 13.