• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习的计算模型:多巴胺作为奖励信号的作用。

Computational models of reinforcement learning: the role of dopamine as a reward signal.

出版信息

Cogn Neurodyn. 2010 Jun;4(2):91-105. doi: 10.1007/s11571-010-9109-x. Epub 2010 Mar 21.

DOI:10.1007/s11571-010-9109-x
PMID:21629583
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2866366/
Abstract

Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success.

摘要

强化学习无处不在。与其他学习形式不同,它涉及快速但内容贫乏的反馈信息的处理,以纠正对任务性质或一组刺激的假设。这种反馈信息通常作为通用奖励或惩罚提供,与要学习的刺激特征几乎没有关系。如此低信息量的反馈怎么能导致如此高效的学习范例呢?通过对现有的强化学习神经计算模型的回顾,我们认为这种学习类型的效率在于使用不同计算水平的大脑系统的动态和协同合作。在突触、细胞、网络和系统水平上实现奖励信号,为生物体提供了进化和行为成功所需的必要鲁棒性、适应性和处理速度。

相似文献

1
Computational models of reinforcement learning: the role of dopamine as a reward signal.强化学习的计算模型:多巴胺作为奖励信号的作用。
Cogn Neurodyn. 2010 Jun;4(2):91-105. doi: 10.1007/s11571-010-9109-x. Epub 2010 Mar 21.
2
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
3
Predictive reward signal of dopamine neurons.多巴胺神经元的预测性奖励信号。
J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.
4
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策:强化学习预测错误在人类中的快速传播。
J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.
5
Learning of sequential movements by neural network model with dopamine-like reinforcement signal.通过具有多巴胺样强化信号的神经网络模型学习连续运动。
Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.
6
Single dose of a dopamine agonist impairs reinforcement learning in humans: evidence from event-related potentials and computational modeling of striatal-cortical function.单剂量多巴胺激动剂会损害人类的强化学习:来自事件相关电位和纹状体-皮质功能计算模型的证据。
Hum Brain Mapp. 2009 Jul;30(7):1963-76. doi: 10.1002/hbm.20642.
7
Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction.精神分裂症中选择性强化学习缺陷支持纹状体-皮质功能障碍计算模型的预测。
Biol Psychiatry. 2007 Oct 1;62(7):756-64. doi: 10.1016/j.biopsych.2006.09.042. Epub 2007 Feb 14.
8
People teach with rewards and punishments as communication, not reinforcements.人们通过奖惩进行教学,而不是通过强化物进行沟通。
J Exp Psychol Gen. 2019 Mar;148(3):520-549. doi: 10.1037/xge0000569.
9
The emergence of saliency and novelty responses from Reinforcement Learning principles.基于强化学习原理的显著性和新颖性反应的出现。
Neural Netw. 2008 Dec;21(10):1493-9. doi: 10.1016/j.neunet.2008.09.004. Epub 2008 Sep 25.
10
Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.中前额叶皮层的强化学习不足可作为 ADHD 中与多巴胺相关的动机缺陷的模型。
Neural Netw. 2013 Oct;46:199-209. doi: 10.1016/j.neunet.2013.05.008. Epub 2013 May 21.

引用本文的文献

1
Health in All Networks Simulator: mixed-methods protocol to test social network interventions for resilience, health and well-being of adults in Amsterdam.全民健康网络模拟器:用于测试针对阿姆斯特丹成年人的复原力、健康和福祉的社交网络干预措施的混合方法方案。
BMJ Open. 2025 Apr 25;15(4):e100703. doi: 10.1136/bmjopen-2025-100703.
2
Reinforcement learning processes as forecasters of depression remission.强化学习过程可预测抑郁缓解。
J Affect Disord. 2025 Jan 1;368:829-837. doi: 10.1016/j.jad.2024.09.066. Epub 2024 Sep 11.
3
Accounting for multiscale processing in adaptive real-world decision-making via the hippocampus.通过海马体在适应性现实世界决策中考虑多尺度处理。
Front Neurosci. 2023 Sep 5;17:1200842. doi: 10.3389/fnins.2023.1200842. eCollection 2023.
4
Social feedback promotes positive social sharing, trust, and closeness.社会反馈促进积极的社会分享、信任和亲密。
Emotion. 2023 Sep;23(6):1536-1548. doi: 10.1037/emo0001182. Epub 2022 Nov 10.
5
Neuroprotection in late life attention-deficit/hyperactivity disorder: A review of pharmacotherapy and phenotype across the lifespan.老年期注意缺陷多动障碍的神经保护:全生命周期药物治疗与表型综述
Front Hum Neurosci. 2022 Sep 26;16:938501. doi: 10.3389/fnhum.2022.938501. eCollection 2022.
6
Reward-based reinforcement learning is altered among individuals with a history of major depressive disorder and psychomotor retardation symptoms.有重度抑郁障碍和精神运动迟缓症状史的个体的基于奖励的强化学习会发生改变。
J Psychiatr Res. 2022 Aug;152:175-181. doi: 10.1016/j.jpsychires.2022.06.032. Epub 2022 Jun 15.
7
New roles for dopamine in motor skill acquisition: lessons from primates, rodents, and songbirds.多巴胺在运动技能获得中的新作用:来自灵长类动物、啮齿动物和鸣禽的启示。
J Neurophysiol. 2021 Jun 1;125(6):2361-2374. doi: 10.1152/jn.00648.2020. Epub 2021 May 12.
8
Computational modelling of social cognition and behaviour-a reinforcement learning primer.社交认知与行为的计算建模——强化学习基础
Soc Cogn Affect Neurosci. 2021 Aug 6;16(8):761-771. doi: 10.1093/scan/nsaa040.
9
Dopamine D2 receptors in discrimination learning and spine enlargement.多巴胺 D2 受体在辨别学习和脊柱增大中的作用。
Nature. 2020 Mar;579(7800):555-560. doi: 10.1038/s41586-020-2115-1. Epub 2020 Mar 18.
10
Separate neural representations of prediction error valence and surprise: Evidence from an fMRI meta-analysis.预测误差效价和惊喜的独立神经表示:来自 fMRI 元分析的证据。
Hum Brain Mapp. 2018 Jul;39(7):2887-2906. doi: 10.1002/hbm.24047. Epub 2018 Mar 25.

本文引用的文献

1
Functional heterogeneity at dopamine release sites.多巴胺释放位点的功能异质性。
J Neurosci. 2009 Nov 18;29(46):14670-80. doi: 10.1523/JNEUROSCI.1349-09.2009.
2
Impulsive choice and response in dopamine agonist-related impulse control behaviors.多巴胺激动剂相关冲动控制行为中的冲动选择和反应。
Psychopharmacology (Berl). 2010 Jan;207(4):645-59. doi: 10.1007/s00213-009-1697-y. Epub 2009 Oct 20.
3
Dopamine neuron glutamate cotransmission: frequency-dependent modulation in the mesoventromedial projection.多巴胺神经元谷氨酸共传递:中脑腹内侧投射中的频率依赖性调制。
Neuroscience. 2009 Dec 15;164(3):1068-83. doi: 10.1016/j.neuroscience.2009.08.057. Epub 2009 Sep 1.
4
Dopamine modulates persistent synaptic activity and enhances the signal-to-noise ratio in the prefrontal cortex.多巴胺调节前额叶皮层中的持续突触活动,并提高信号噪声比。
PLoS One. 2009 Aug 5;4(8):e6507. doi: 10.1371/journal.pone.0006507.
5
L-type voltage-dependent Ca(2+) channels mediate expression of presynaptic LTP in amygdala.L型电压依赖性钙通道介导杏仁核中突触前长时程增强的表达。
Nat Neurosci. 2009 Sep;12(9):1093-5. doi: 10.1038/nn.2378. Epub 2009 Aug 2.
6
Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation.前额叶和纹状体多巴胺能基因预测探索与利用方面的个体差异。
Nat Neurosci. 2009 Aug;12(8):1062-8. doi: 10.1038/nn.2342. Epub 2009 Jul 20.
7
Amygdala inhibitory circuits and the control of fear memory.杏仁核抑制性回路与恐惧记忆的控制
Neuron. 2009 Jun 25;62(6):757-71. doi: 10.1016/j.neuron.2009.05.026.
8
Two types of dopamine neuron distinctly convey positive and negative motivational signals.两种类型的多巴胺神经元分别传递积极和消极的动机信号。
Nature. 2009 Jun 11;459(7248):837-41. doi: 10.1038/nature08028. Epub 2009 May 17.
9
Background dopamine concentration dependently facilitates long-term potentiation in rat prefrontal cortex through postsynaptic activation of extracellular signal-regulated kinases.背景多巴胺浓度依赖性地通过细胞外信号调节激酶的突触后激活促进大鼠前额叶皮层的长时程增强。
Cereb Cortex. 2009 Nov;19(11):2708-18. doi: 10.1093/cercor/bhp047. Epub 2009 Mar 12.
10
A spiking neural network model of an actor-critic learning agent.一种基于演员-评论家学习智能体的脉冲神经网络模型。
Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.