• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过具有多巴胺样强化信号的神经网络模型学习连续运动。

Learning of sequential movements by neural network model with dopamine-like reinforcement signal.

作者信息

Suri R E, Schultz W

机构信息

Institute of Physiology, University of Fribourg, Switzerland.

出版信息

Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.

DOI:10.1007/s002210050467
PMID:9746140
Abstract

Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to reward-predicting stimuli in a similar manner. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a reward-prediction error signal strongly resembling dopamine responses for learning movement sequences. A different stimulus was presented in each step of the sequence and required a different movement reaction, and reward occurred at the end of the correctly performed sequence. The dopamine-like predictive reinforcement signal efficiently allowed the model to learn long sequences. By contrast, learning with an unconditional reinforcement signal required synaptic eligibility traces of longer and biologically less-plausible durations for obtaining satisfactory performance. Thus, dopamine-like neuronal signals constitute excellent teaching signals for learning sequential behavior.

摘要

多巴胺神经元似乎对奖励预测中的误差进行编码。它们会被意外的奖励激活,不受预期奖励的影响,而当预期奖励被省略时则会受到抑制。经过条件作用后,它们以类似的方式对奖励预测刺激做出反应。基于这些特性,多巴胺反应与实施时间差分学习算法的神经网络模型的预测强化教学信号极为相似。本研究探索了一种神经网络模型,该模型使用与多巴胺反应极为相似的奖励预测误差信号来学习运动序列。序列的每个步骤呈现不同的刺激,并需要不同的运动反应,且奖励出现在正确执行序列的末尾。类似多巴胺的预测强化信号有效地使模型能够学习长序列。相比之下,使用无条件强化信号进行学习需要更长且生物学上不太合理时长的突触资格痕迹才能获得令人满意的表现。因此,类似多巴胺的神经元信号构成了用于学习序列行为的极佳教学信号。

相似文献

1
Learning of sequential movements by neural network model with dopamine-like reinforcement signal.通过具有多巴胺样强化信号的神经网络模型学习连续运动。
Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.
2
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
3
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
4
Predictive reward signal of dopamine neurons.多巴胺神经元的预测性奖励信号。
J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.
5
Dopamine neurons report an error in the temporal prediction of reward during learning.多巴胺神经元在学习过程中报告奖励时间预测的误差。
Nat Neurosci. 1998 Aug;1(4):304-9. doi: 10.1038/1124.
6
Modeling functions of striatal dopamine modulation in learning and planning.纹状体多巴胺调节在学习和规划中的建模功能。
Neuroscience. 2001;103(1):65-85. doi: 10.1016/s0306-4522(00)00554-6.
7
The emergence of saliency and novelty responses from Reinforcement Learning principles.基于强化学习原理的显著性和新颖性反应的出现。
Neural Netw. 2008 Dec;21(10):1493-9. doi: 10.1016/j.neunet.2008.09.004. Epub 2008 Sep 25.
8
Reward signaling by dopamine neurons.多巴胺神经元的奖赏信号传导
Neuroscientist. 2001 Aug;7(4):293-302. doi: 10.1177/107385840100700406.
9
Anticipatory reward signals in ventral striatal neurons of behaving rats.行为大鼠腹侧纹状体神经元中的预期奖励信号。
Eur J Neurosci. 2008 Nov;28(9):1849-66. doi: 10.1111/j.1460-9568.2008.06480.x.
10
Learning movement sequences with a delayed reward signal in a hierarchical model of motor function.在运动功能的层次模型中利用延迟奖励信号学习运动序列。
Neural Netw. 2007 Mar;20(2):172-81. doi: 10.1016/j.neunet.2006.01.016. Epub 2006 May 15.

引用本文的文献

1
Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.皮层对 NAc 的输入比丘脑的输入更具有选择选择性,从而支持强化学习。
Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.
2
Involvement of Midbrain Dopamine Neuron Activity in Negative Reinforcement Learning in Mice.中脑多巴胺神经元活动参与小鼠的负强化学习
Mol Neurobiol. 2021 Nov;58(11):5667-5681. doi: 10.1007/s12035-021-02515-6. Epub 2021 Aug 13.
3
Modeling nucleus accumbens : A Computational Model from Single Cell to Circuit Level.
伏隔核建模:从单细胞到电路水平的计算模型
J Comput Neurosci. 2021 Feb;49(1):21-35. doi: 10.1007/s10827-020-00769-y. Epub 2020 Nov 9.
4
Learning to select actions shapes recurrent dynamics in the corticostriatal system.学习选择动作塑造皮质纹状体系统的反复动态。
Neural Netw. 2020 Dec;132:375-393. doi: 10.1016/j.neunet.2020.09.008. Epub 2020 Sep 19.
5
A systems-neuroscience model of phasic dopamine.相位多巴胺的系统神经科学模型。
Psychol Rev. 2020 Nov;127(6):972-1021. doi: 10.1037/rev0000199. Epub 2020 Jun 11.
6
Desirability, availability, credit assignment, category learning, and attention: Cognitive-emotional and working memory dynamics of orbitofrontal, ventrolateral, and dorsolateral prefrontal cortices.合意性、可得性、信用分配、类别学习与注意力:眶额叶、腹外侧和背外侧前额叶皮质的认知-情感及工作记忆动态
Brain Neurosci Adv. 2018 May 8;2:2398212818772179. doi: 10.1177/2398212818772179. eCollection 2018 Jan-Dec.
7
Dynamical Motor Control Learned with Deep Deterministic Policy Gradient.基于深度确定性策略梯度的动态运动控制学习。
Comput Intell Neurosci. 2018 Jan 31;2018:8535429. doi: 10.1155/2018/8535429. eCollection 2018.
8
The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty.内侧前额叶皮层在状态不确定下塑造多巴胺奖励预测误差。
Neuron. 2018 May 2;98(3):616-629.e6. doi: 10.1016/j.neuron.2018.03.036. Epub 2018 Apr 12.
9
A silent eligibility trace enables dopamine-dependent synaptic plasticity for reinforcement learning in the mouse striatum.沉默的资格痕迹使多巴胺依赖的突触可塑性能够在小鼠纹状体中进行强化学习。
Eur J Neurosci. 2019 Mar;49(5):726-736. doi: 10.1111/ejn.13921. Epub 2018 Apr 14.
10
A Basal Ganglia Circuit Sufficient to Guide Birdsong Learning.基底神经节回路足以指导鸟鸣学习。
Neuron. 2018 Apr 4;98(1):208-221.e5. doi: 10.1016/j.neuron.2018.02.020. Epub 2018 Mar 15.