利用负价强化信号改善机器人运动学习

Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals.

作者信息

Navarro-Guerrero Nicolás, Lowe Robert J, Wermter Stefan

机构信息

Knowledge Technology, Informatics Department, University of Hamburg, Hamburg, Germany.

Division of Cognition and Communication, Department of Applied IT, University of Gothenburg, Gothenburg, Sweden.

出版信息

Front Neurorobot. 2017 Apr 3;11:10. doi: 10.3389/fnbot.2017.00010. eCollection 2017.

DOI:10.3389/fnbot.2017.00010

PMID:28420976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5376586/

Abstract

Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance-in terms of task error, the amount of perceived nociception, and length of learned action sequences-of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning-making the algorithm more robust against network initializations-as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics.

摘要

伤害感受信号和惩罚信号都已应用于机器人技术中。然而，将这些负价类型的强化学习信号用于机器人学习的潜力尚未得到详细发掘。伤害感受信号主要用作预编程动作序列的触发因素。惩罚信号通常是脱离实体的，即与智能体的内在局限性没有或几乎没有关系，并且它们经常被用于施加行为约束。在此，我们提供了一种将伤害感受信号用作学习驱动因素而非预编程行为简单触发因素的替代方法。具体而言，我们利用伤害感受来扩展状态空间，同时将惩罚用作负强化学习信号。我们比较了不同神经网络在逆运动学学习中基于惩罚的强化信号下的性能，包括任务误差、感知到的伤害感受量以及学习到的动作序列长度。我们将接收伤害感受输入的神经网络版本的性能与没有该过程的版本进行对比。此外，我们提供证据表明，伤害感受可以改善学习——使算法对网络初始化更具鲁棒性——以及通过减少任务误差、感知到的伤害感受和学习到的动作序列长度来提高行为表现。而且，我们提供证据表明，至少在强化学习应用中通常使用的惩罚，在所有相关指标中可能是有害的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46c9/5376586/6b39bf4ef2d5/fnbot-11-00010-g001.jpg

相似文献

Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals.

Front Neurorobot. 2017 Apr 3;11:10. doi: 10.3389/fnbot.2017.00010. eCollection 2017.

Modular deep reinforcement learning from reward and punishment for robot navigation.

Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.

Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

Brain. 2017 Apr 1;140(4):1147-1157. doi: 10.1093/brain/awx025.

Goal-directed autonomous navigation of mobile robot based on the principle of neuromodulation.

Network. 2019 Feb-Nov;30(1-4):79-106. doi: 10.1080/0954898X.2019.1668575. Epub 2019 Sep 30.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

J Neurosci. 2016 Nov 16;36(46):11682-11692. doi: 10.1523/JNEUROSCI.1767-16.2016.

Simulation of rat behavior by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals.

Biosystems. 2005 Apr;80(1):83-90. doi: 10.1016/j.biosystems.2004.10.005. Epub 2004 Dec 8.

Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.

Front Neurosci. 2017 Oct 30;11:598. doi: 10.3389/fnins.2017.00598. eCollection 2017.

Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

Neural Netw. 2018 Nov;107:23-33. doi: 10.1016/j.neunet.2018.03.014. Epub 2018 Mar 26.

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

Biol Cybern. 2009 Mar;100(3):249-60. doi: 10.1007/s00422-009-0295-8. Epub 2009 Feb 20.

引用本文的文献

Editorial: Cognitive inspired aspects of robot learning.

Front Neurorobot. 2023 Aug 24;17:1256788. doi: 10.3389/fnbot.2023.1256788. eCollection 2023.

Neurorobotics-A Thriving Community and a Promising Pathway Toward Intelligent Cognitive Robots.

Front Neurorobot. 2018 Jul 16;12:42. doi: 10.3389/fnbot.2018.00042. eCollection 2018.

本文引用的文献

The dissociable effects of punishment and reward on motor learning.

Nat Neurosci. 2015 Apr;18(4):597-602. doi: 10.1038/nn.3956. Epub 2015 Feb 23.

Individual differences in sensitivity to reward and punishment and neural activity during reward and avoidance learning.

Soc Cogn Affect Neurosci. 2015 Sep;10(9):1219-27. doi: 10.1093/scan/nsv007. Epub 2015 Feb 12.

Self-protective whole body motion for humanoid robots based on synergy of global reaction and local reflex.

Neural Netw. 2012 Aug;32:109-18. doi: 10.1016/j.neunet.2012.02.011. Epub 2012 Feb 14.

Punishing an error improves learning: the influence of punishment magnitude on error-related neural activity and subsequent learning.

J Neurosci. 2010 Nov 17;30(46):15600-7. doi: 10.1523/JNEUROSCI.2565-10.2010.

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

Biol Cybern. 2009 Mar;100(3):249-60. doi: 10.1007/s00422-009-0295-8. Epub 2009 Feb 20.

Differential effect of reward and punishment on procedural learning.

J Neurosci. 2009 Jan 14;29(2):436-43. doi: 10.1523/JNEUROSCI.4132-08.2009.

Reinforcement learning: the good, the bad and the ugly.

Curr Opin Neurobiol. 2008 Apr;18(2):185-96. doi: 10.1016/j.conb.2008.08.003. Epub 2008 Aug 22.

Visual feedback is not necessary for the learning of novel dynamics.

PLoS One. 2007 Dec 19;2(12):e1336. doi: 10.1371/journal.pone.0001336.

Opponent appetitive-aversive neural processes underlie predictive learning of pain relief.

Nat Neurosci. 2005 Sep;8(9):1234-40. doi: 10.1038/nn1527. Epub 2005 Aug 21.

From nociception to pain perception: imaging the spinal and supraspinal pathways.

J Anat. 2005 Jul;207(1):19-33. doi: 10.1111/j.1469-7580.2005.00428.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用负价强化信号改善机器人运动学习

Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献