• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

研究时间差分增量Delta-Bar-Delta在实际预测知识架构中的应用。

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

作者信息

Günther Johannes, Ady Nadia M, Kearney Alex, Dawson Michael R, Pilarski Patrick M

机构信息

Department of Computing Science, University of Alberta, Edmonton, AB, Canada.

Alberta Machine Intelligence Institute, Edmonton, AB, Canada.

出版信息

Front Robot AI. 2020 Mar 13;7:34. doi: 10.3389/frobt.2020.00034. eCollection 2020.

DOI:10.3389/frobt.2020.00034
PMID:33501202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7805647/
Abstract

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the or ). Typically, these parameters are chosen based on an extensive parameter search-an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

摘要

预测和预测性知识最近不仅在改善机器人控制方面取得了成功,而且在从工业过程控制到康复等其他应用领域也取得了成功。这些预测方法非常适合机器人技术的一个特性是,它们可以通过与环境的交互在线且增量地学习。然而,许多预测学习方法仍然面临的一个挑战是预测学习参数的适当选择,特别是控制学习机器对其预测进行更新的幅度的参数(步长或学习率)。通常,这些参数是基于广泛的参数搜索来选择的——这种方法既没有很好的扩展性,也不适合由于非平稳性而需要改变步长的任务。为了开始应对这一挑战,我们研究了使用模块化假肢手臂进行在线步长自适应:一种供截肢者使用的、传感器丰富的机器人手臂。我们选择的方法,即时间差分增量德尔塔-巴-德尔塔(TIDBD),在特征层面学习并自适应步长;重要的是,TIDBD允许步长调整和表示学习同时进行。作为第一个贡献,我们通过广泛的参数搜索表明,TIDBD是经典时间差分(TD)学习的一种实用替代方法。在预测机器人数据流的未来方面,这两种方法的表现相当,但TD只有在经过精心手动调整学习率的情况下才能达到可比的性能,而TIDBD使用一个稳健的元参数并自行调整学习率。其次,我们的结果表明,对于这个特定应用,TIDBD允许系统自动检测许多机器人应用中常见的传感器故障特征模式。作为第三个贡献,我们研究了经典TD和TIDBD在我们的机器人数据集上对初始步长值的敏感性,再次证实了之前论文中所示的TIDBD的稳健性。总之,这些结果有望提高机器人设备以稳健方式从与环境的交互中学习的能力,为自主智能体和机器人提供关键能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/1c6b06382064/frobt-07-00034-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/26e2bb21df58/frobt-07-00034-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/26c406c74da5/frobt-07-00034-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/562bf6030c46/frobt-07-00034-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/b78ce55a0864/frobt-07-00034-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/add55aef78ec/frobt-07-00034-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/e3e9d03dbd2c/frobt-07-00034-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/1c6b06382064/frobt-07-00034-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/26e2bb21df58/frobt-07-00034-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/26c406c74da5/frobt-07-00034-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/562bf6030c46/frobt-07-00034-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/b78ce55a0864/frobt-07-00034-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/add55aef78ec/frobt-07-00034-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/e3e9d03dbd2c/frobt-07-00034-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/7805647/1c6b06382064/frobt-07-00034-g0007.jpg

相似文献

1
Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.研究时间差分增量Delta-Bar-Delta在实际预测知识架构中的应用。
Front Robot AI. 2020 Mar 13;7:34. doi: 10.3389/frobt.2020.00034. eCollection 2020.
2
Reactive and Cognitive Search Strategies for Olfactory Robots嗅觉机器人的反应式与认知式搜索策略
3
Gaussian Processes for Data-Efficient Learning in Robotics and Control.机器人与控制中的数据高效学习的高斯过程
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):408-23. doi: 10.1109/TPAMI.2013.218.
4
Rapid, safe, and incremental learning of navigation strategies.快速、安全且逐步学习导航策略。
IEEE Trans Syst Man Cybern B Cybern. 1996;26(3):408-20. doi: 10.1109/3477.499792.
5
Toward an Interactive Reinforcement Based Learning Framework for Human Robot Collaborative Assembly Processes.面向人机协作装配过程的基于交互式强化学习的框架
Front Robot AI. 2018 Nov 22;5:126. doi: 10.3389/frobt.2018.00126. eCollection 2018.
6
Multi-Channel Interactive Reinforcement Learning for Sequential Tasks.用于序列任务的多通道交互式强化学习
Front Robot AI. 2020 Sep 24;7:97. doi: 10.3389/frobt.2020.00097. eCollection 2020.
7
Online Gait Learning for Modular Robots with Arbitrary Shapes and Sizes.适用于任意形状和尺寸模块化机器人的在线步态学习
Artif Life. 2017 Winter;23(1):80-104. doi: 10.1162/ARTL_a_00223. Epub 2017 Jan 31.
8
The Synthetic Moth: A Neuromorphic Approach toward Artificial Olfaction in Robots合成蛾:一种用于机器人人工嗅觉的神经形态方法
9
Learning from demonstration: Teaching a myoelectric prosthesis with an intact limb via reinforcement learning.从示范中学习:通过强化学习用健全肢体训练肌电假肢。
IEEE Int Conf Rehabil Robot. 2017 Jul;2017:1457-1464. doi: 10.1109/ICORR.2017.8009453.
10
Evolutionary online behaviour learning and adaptation in real robots.真实机器人中的进化在线行为学习与适应
R Soc Open Sci. 2017 Jul 26;4(7):160938. doi: 10.1098/rsos.160938. eCollection 2017 Jul.

引用本文的文献

1
Prediction, Knowledge, and Explainability: Examining the Use of General Value Functions in Machine Knowledge.预测、知识与可解释性:审视通用价值函数在机器知识中的应用。
Front Artif Intell. 2022 Mar 31;5:826724. doi: 10.3389/frai.2022.826724. eCollection 2022.

本文引用的文献

1
Pavlovian control of intraspinal microstimulation to produce over-ground walking.条件反射控制脊髓内微刺激以产生地面行走。
J Neural Eng. 2020 Jun 2;17(3):036002. doi: 10.1088/1741-2552/ab8e8e.
2
Surprise and destabilize: prediction error influences episodic memory reconsolidation.惊喜与不稳定:预测错误影响情景记忆再巩固。
Learn Mem. 2018 Jul 16;25(8):369-381. doi: 10.1101/lm.046912.117. Print 2018 Aug.
3
Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.
向智能假肢和其他可穿戴辅助机器人呈现高维数据:瓦片编码与选择性卡内瓦编码的首次比较。
IEEE Int Conf Rehabil Robot. 2017 Jul;2017:1443-1450. doi: 10.1109/ICORR.2017.8009451.
4
Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching.实时机器学习在肌电假肢控制中的应用:自适应切换的病例系列
Prosthet Orthot Int. 2016 Oct;40(5):573-81. doi: 10.1177/0309364615605373. Epub 2015 Sep 30.
5
Neuronal coding of prediction errors.预测误差的神经元编码。
Annu Rev Neurosci. 2000;23:473-500. doi: 10.1146/annurev.neuro.23.1.473.