• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过主动学习和探索性行为组合实现自主机器人游戏的技能学习

Skill Learning by Autonomous Robotic Playing Using Active Learning and Exploratory Behavior Composition.

作者信息

Hangl Simon, Dunjko Vedran, Briegel Hans J, Piater Justus

机构信息

Intelligent and Interactive Systems, Department of Informatics, University of Innsbruck, Innsbruck, Austria.

LIACS, Leiden University, Leiden, Netherlands.

出版信息

Front Robot AI. 2020 Apr 3;7:42. doi: 10.3389/frobt.2020.00042. eCollection 2020.

DOI:10.3389/frobt.2020.00042
PMID:33501210
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7806109/
Abstract

We consider the problem of autonomous acquisition of manipulation skills where problem-solving strategies are initially available only for a narrow range of situations. We propose to extend the range of solvable situations by autonomous play with the object. By applying previously-trained skills and behaviors, the robot learns how to prepare situations for which a successful strategy is already known. The information gathered during autonomous play is additionally used to train an environment model. This model is exploited for active learning and the generation of novel preparatory behaviors compositions. We apply our approach to a wide range of different manipulation tasks, e.g., book grasping, grasping of objects of different sizes by selecting different grasping strategies, placement on shelves, and tower disassembly. We show that the composite behavior generation mechanism enables the robot to solve previously-unsolvable tasks, e.g., tower disassembly. We use success statistics gained during real-world experiments to simulate the convergence behavior of our system. Simulation experiments show that the learning speed can be improved by around 30% by using active learning.

摘要

我们考虑自主获取操作技能的问题,其中解决问题的策略最初仅适用于狭窄范围的情况。我们建议通过与物体自主玩耍来扩展可解决情况的范围。通过应用先前训练的技能和行为,机器人学习如何为已经知道成功策略的情况做好准备。在自主玩耍期间收集的信息还用于训练环境模型。该模型用于主动学习和生成新颖的准备行为组合。我们将我们的方法应用于广泛的不同操作任务,例如书本抓取、通过选择不同的抓取策略抓取不同尺寸的物体、放置在架子上以及塔架拆卸。我们表明,复合行为生成机制使机器人能够解决以前无法解决的任务,例如塔架拆卸。我们使用在实际实验中获得的成功统计数据来模拟我们系统的收敛行为。仿真实验表明,通过使用主动学习,学习速度可以提高约30%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/2541acc35be1/frobt-07-00042-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/b713edd7841f/frobt-07-00042-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/fd513e257fde/frobt-07-00042-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/62dfca200372/frobt-07-00042-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/b303ed19e036/frobt-07-00042-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/fd172adf6a88/frobt-07-00042-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/9ba2a2f95fb6/frobt-07-00042-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/2541acc35be1/frobt-07-00042-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/b713edd7841f/frobt-07-00042-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/fd513e257fde/frobt-07-00042-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/62dfca200372/frobt-07-00042-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/b303ed19e036/frobt-07-00042-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/fd172adf6a88/frobt-07-00042-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/9ba2a2f95fb6/frobt-07-00042-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc5/7806109/2541acc35be1/frobt-07-00042-g0007.jpg

相似文献

1
Skill Learning by Autonomous Robotic Playing Using Active Learning and Exploratory Behavior Composition.通过主动学习和探索性行为组合实现自主机器人游戏的技能学习
Front Robot AI. 2020 Apr 3;7:42. doi: 10.3389/frobt.2020.00042. eCollection 2020.
2
Learning tactile skills through curious exploration.通过好奇探索学习触觉技能。
Front Neurorobot. 2012 Jul 23;6:6. doi: 10.3389/fnbot.2012.00006. eCollection 2012.
3
RL-DOVS: Reinforcement Learning for Autonomous Robot Navigation in Dynamic Environments.RL-DOVS:动态环境下自主机器人导航的强化学习。
Sensors (Basel). 2022 May 19;22(10):3847. doi: 10.3390/s22103847.
4
Robot grasping method optimization using improved deep deterministic policy gradient algorithm of deep reinforcement learning.基于深度强化学习的改进深度确定性策略梯度算法的机器人抓取方法优化
Rev Sci Instrum. 2021 Feb 1;92(2):025114. doi: 10.1063/5.0034101.
5
Review of Learning-Based Robotic Manipulation in Cluttered Environments.基于学习的杂乱环境机器人操作综述。
Sensors (Basel). 2022 Oct 18;22(20):7938. doi: 10.3390/s22207938.
6
Learning-based control approaches for service robots on cloth manipulation and dressing assistance: a comprehensive review.基于学习的服务机器人布料操作和穿衣辅助控制方法:全面综述。
J Neuroeng Rehabil. 2022 Nov 3;19(1):117. doi: 10.1186/s12984-022-01078-4.
7
Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation.变分信息瓶颈正则化深度强化学习在机器人高效技能自适应中的应用。
Sensors (Basel). 2023 Jan 9;23(2):762. doi: 10.3390/s23020762.
8
Object Manipulation with an Anthropomorphic Robotic Hand via Deep Reinforcement Learning with a Synergy Space of Natural Hand Poses.基于自然手位协同空间的深度强化学习的拟人机器人手操作
Sensors (Basel). 2021 Aug 5;21(16):5301. doi: 10.3390/s21165301.
9
Learning intraoperative organ manipulation with context-based reinforcement learning.基于上下文的强化学习来学习术中器官操作。
Int J Comput Assist Radiol Surg. 2022 Aug;17(8):1419-1427. doi: 10.1007/s11548-022-02630-2. Epub 2022 May 3.
10
From the Dexterous Surgical Skill to the Battlefield-A Robotics Exploratory Study.从灵巧手术技能到战场——机器人探索研究。
Mil Med. 2021 Jan 25;186(Suppl 1):288-294. doi: 10.1093/milmed/usaa253.

引用本文的文献

1
Free Energy Projective Simulation (FEPS): Active inference with interpretability.自由能投射模拟(FEPS):具有可解释性的主动推理
PLoS One. 2025 Sep 4;20(9):e0331047. doi: 10.1371/journal.pone.0331047. eCollection 2025.
2
How a Minimal Learning Agent can Infer the Existence of Unobserved Variables in a Complex Environment.一个最小学习智能体如何在复杂环境中推断未观察到的变量的存在。
Minds Mach (Dordr). 2023;33(1):185-219. doi: 10.1007/s11023-022-09619-5. Epub 2022 Dec 29.
3
Honeybee communication during collective defence is shaped by predation.

本文引用的文献

1
On the convergence of projective-simulation-based reinforcement learning in Markov decision processes.基于投影模拟的强化学习在马尔可夫决策过程中的收敛性
Quantum Mach Intell. 2020;2(2):13. doi: 10.1007/s42484-020-00023-9. Epub 2020 Nov 5.
2
Modelling collective motion based on the principle of agency: General framework and the case of marching locusts.基于主体原则的群体运动建模:一般框架及行军蝗虫案例。
PLoS One. 2019 Feb 20;14(2):e0212044. doi: 10.1371/journal.pone.0212044. eCollection 2019.
3
Projective simulation with generalization.
在集体防御过程中,蜜蜂的交流方式受到捕食的影响。
BMC Biol. 2021 May 25;19(1):106. doi: 10.1186/s12915-021-01028-x.
4
On the convergence of projective-simulation-based reinforcement learning in Markov decision processes.基于投影模拟的强化学习在马尔可夫决策过程中的收敛性
Quantum Mach Intell. 2020;2(2):13. doi: 10.1007/s42484-020-00023-9. Epub 2020 Nov 5.
5
A Stochastic Process Model for Free Agency under Indeterminism.不确定性下自由球员的随机过程模型
Dialectica (Bern). 2018 Jun;72(2):219-252. doi: 10.1111/1746-8361.12222. Epub 2018 Aug 24.
具有泛化能力的投射模拟
Sci Rep. 2017 Oct 31;7(1):14430. doi: 10.1038/s41598-017-14740-y.
4
Adaptive quantum computation in changing environments using projective simulation.在变化环境中使用投影模拟的自适应量子计算。
Sci Rep. 2015 Aug 11;5:12874. doi: 10.1038/srep12874.
5
Information driven self-organization of complex robotic behaviors.信息驱动的复杂机器人行为的自组织。
PLoS One. 2013 May 27;8(5):e63400. doi: 10.1371/journal.pone.0063400. Print 2013.
6
On creative machines and the physical origins of freedom.论创造性机器和自由的物理起源
Sci Rep. 2012;2:522. doi: 10.1038/srep00522. Epub 2012 Jul 20.
7
A biologically inspired meta-control navigation system for the Psikharpax rat robot.受生物启发的 Psikharpax 老鼠机器人元控制导航系统。
Bioinspir Biomim. 2012 Jun;7(2):025009. doi: 10.1088/1748-3182/7/2/025009. Epub 2012 May 22.
8
Projective simulation for artificial intelligence.人工智能的投影模拟。
Sci Rep. 2012;2:400. doi: 10.1038/srep00400. Epub 2012 May 15.
9
Habits, action sequences and reinforcement learning.习惯、动作序列和强化学习。
Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.
10
Variants of guided self-organization for robot control.用于机器人控制的引导式自组织变体。
Theory Biosci. 2012 Sep;131(3):129-37. doi: 10.1007/s12064-011-0141-0. Epub 2011 Nov 25.