• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多巴胺能平衡奖励最大化和策略复杂性。

Dopaminergic Balance between Reward Maximization and Policy Complexity.

机构信息

The Interdisciplinary Center for Neural Computation, The Hebrew University Jerusalem, Israel.

出版信息

Front Syst Neurosci. 2011 May 9;5:22. doi: 10.3389/fnsys.2011.00022. eCollection 2011.

DOI:10.3389/fnsys.2011.00022
PMID:21603228
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3093748/
Abstract

Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems.

摘要

先前关于基底神经节网络的强化学习模型强调了多巴胺在编码预测与现实之间的不匹配方面的作用。但对主要轴突(actor)的计算目标和算法的关注要少得多。在这里,我们构建了一个强调多巴胺作为强化学习信号和作为控制作用代理的基底神经节整体兴奋性和运动警觉性的伪温度信号的基底神经节的自上而下模型。我们认为基底神经节赋予丘脑-皮层网络在两个约束之间进行最佳动态权衡的能力:最小化策略复杂度(成本)和最大化预期未来奖励(收益)。我们表明,这个多维优化过程导致了经验调制的软最大化行为策略。因此,与经典的软最大化行为策略一样,根据估计值和伪温度选择动作的概率,但此外还根据这些动作的先前选择的频率而变化。我们得出结论,基底神经节的计算目标不是最大化累积(正和负)奖励。相反,基底神经节旨在优化独立的收益和成本函数。与之前提出的单变量最大化过程不同,这个多维优化过程自然导致了类似于软最大化的行为策略。我们建议,除了在调制皮质-纹状体突触的效能方面的作用外,多巴胺还直接影响纹状体的兴奋性,并因此提供了一种伪温度信号,调节收益和成本之间的权衡。然后,由此产生的经验和多巴胺调制的软最大化策略可以作为一个理论框架,解释由基底神经节和多巴胺系统控制的广泛的行为和临床状态。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/3a411fb750f3/fnsys-05-00022-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/bb5e8baeea15/fnsys-05-00022-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/14d820b098d7/fnsys-05-00022-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/8532ca36dd65/fnsys-05-00022-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/d03eceab1b33/fnsys-05-00022-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/3a411fb750f3/fnsys-05-00022-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/bb5e8baeea15/fnsys-05-00022-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/14d820b098d7/fnsys-05-00022-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/8532ca36dd65/fnsys-05-00022-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/d03eceab1b33/fnsys-05-00022-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85f8/3093748/3a411fb750f3/fnsys-05-00022-g005.jpg

相似文献

1
Dopaminergic Balance between Reward Maximization and Policy Complexity.多巴胺能平衡奖励最大化和策略复杂性。
Front Syst Neurosci. 2011 May 9;5:22. doi: 10.3389/fnsys.2011.00022. eCollection 2011.
2
Actor-critic models of the basal ganglia: new anatomical and computational perspectives.基底神经节的 Actor-评论家模型:新的解剖学和计算视角。
Neural Netw. 2002 Jun-Jul;15(4-6):535-47. doi: 10.1016/s0893-6080(02)00047-3.
3
Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.通过包括直接和间接途径来增强强化学习模型可以提高纹状体依赖任务的性能。
PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug.
4
Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning.计算奖励预测误差:关于皮层时间和基底神经节通路在食欲性和厌恶性学习中的综合阐述
Eur J Neurosci. 2015 Aug;42(4):2003-21. doi: 10.1111/ejn.12994. Epub 2015 Jul 25.
5
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
6
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
7
Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control.小脑基于相关性学习与基底神经节基于奖励学习的神经调节适应性组合,用于目标导向行为控制。
Front Neural Circuits. 2014 Oct 28;8:126. doi: 10.3389/fncir.2014.00126. eCollection 2014.
8
A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.皮质-基底神经节通路的双重作用假说:多巴胺和腺苷介导的对立和时间差分。
Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018.
9
The role of prediction and outcomes in adaptive cognitive control.预测与结果在适应性认知控制中的作用。
J Physiol Paris. 2015 Feb-Jun;109(1-3):38-52. doi: 10.1016/j.jphysparis.2015.02.001. Epub 2015 Feb 17.
10
[Decision-making and learning by cortico-basal ganglia network].[皮质-基底神经节网络的决策与学习]
Brain Nerve. 2008 Jul;60(7):799-813.

引用本文的文献

1
Computationally Informed Insights Into Anhedonia and Treatment by Kappa Opioid Receptor Antagonism.通过κ阿片受体拮抗作用对快感缺失及其治疗的计算性洞察
Biol Psychiatry Cogn Neurosci Neuroimaging. 2025 May 28. doi: 10.1016/j.bpsc.2025.05.011.
2
The Reward-Complexity Trade-off in Schizophrenia.精神分裂症中的奖赏-复杂性权衡
Comput Psychiatr. 2021 May 25;5(1):38-53. doi: 10.5334/cpsy.71. eCollection 2021.
3
Resource-rational psychopathology.资源理性精神病理学。

本文引用的文献

1
Protracted benefit from paradoxical kinesia in typical and atypical parkinsonisms.典型和非典型帕金森病中反常运动带来的持久获益。
Neurol Sci. 2010 Dec;31(6):751-6. doi: 10.1007/s10072-010-0403-5. Epub 2010 Sep 22.
2
Neurocomputational models of motor and cognitive deficits in Parkinson's disease.帕金森病运动和认知缺陷的神经计算模型。
Prog Brain Res. 2010;183:275-97. doi: 10.1016/S0079-6123(10)83014-6.
3
The addicted synapse: mechanisms of synaptic and structural plasticity in nucleus accumbens.成瘾性突触:伏隔核中突触和结构可塑性的机制。
Behav Neurosci. 2024 Aug;138(4):221-234. doi: 10.1037/bne0000600. Epub 2024 May 16.
4
Human decision making balances reward maximization and policy compression.人类决策平衡了奖励最大化和策略压缩。
PLoS Comput Biol. 2024 Apr 26;20(4):e1012057. doi: 10.1371/journal.pcbi.1012057. eCollection 2024 Apr.
5
Bayesian Reinforcement Learning With Limited Cognitive Load.认知负荷有限的贝叶斯强化学习
Open Mind (Camb). 2024 Apr 3;8:395-438. doi: 10.1162/opmi_a_00132. eCollection 2024.
6
Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.主动强化学习与动作偏差和滞后的比较:混合专家与非专家的控制。
PLoS Comput Biol. 2024 Mar 29;20(3):e1011950. doi: 10.1371/journal.pcbi.1011950. eCollection 2024 Mar.
7
Undermatching Is a Consequence of Policy Compression.政策压缩导致不匹配。
J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.
8
Neural Networks With Motivation.具有动机的神经网络。
Front Syst Neurosci. 2021 Jan 11;14:609316. doi: 10.3389/fnsys.2020.609316. eCollection 2020.
9
Ketamine induced converged synchronous gamma oscillations in the cortico-basal ganglia network of nonhuman primates.氯胺酮在非人灵长类动物的皮质-基底神经节网络中诱导出汇聚同步伽马振荡。
J Neurophysiol. 2017 Aug 1;118(2):917-931. doi: 10.1152/jn.00765.2016. Epub 2017 May 3.
10
Neuronal prediction of opponent's behavior during cooperative social interchange in primates.灵长类动物在合作性社会互动中对对手行为的神经元预测
Cell. 2015 Mar 12;160(6):1233-45. doi: 10.1016/j.cell.2015.01.045. Epub 2015 Feb 26.
Trends Neurosci. 2010 Jun;33(6):267-76. doi: 10.1016/j.tins.2010.02.002. Epub 2010 Mar 5.
4
Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task.多巴胺能药物在动态觅食任务中调节帕金森病患者的学习率和持续性。
J Neurosci. 2009 Dec 2;29(48):15104-14. doi: 10.1523/JNEUROSCI.3524-09.2009.
5
How humans integrate the prospects of pain and reward during choice.人类在做出选择时如何综合考虑疼痛和奖励的可能性。
J Neurosci. 2009 Nov 18;29(46):14617-26. doi: 10.1523/JNEUROSCI.2026-09.2009.
6
Striatal plasticity and basal ganglia circuit function.纹状体可塑性与基底神经节回路功能。
Neuron. 2008 Nov 26;60(4):543-54. doi: 10.1016/j.neuron.2008.11.005.
7
Differential excitability and modulation of striatal medium spiny neuron dendrites.纹状体中等棘状神经元树突的兴奋性差异与调节
J Neurosci. 2008 Nov 5;28(45):11603-14. doi: 10.1523/JNEUROSCI.1840-08.2008.
8
Neurocomputational models of basal ganglia function in learning, memory and choice.基底神经节在学习、记忆和选择中功能的神经计算模型。
Behav Brain Res. 2009 Apr 12;199(1):141-56. doi: 10.1016/j.bbr.2008.09.029. Epub 2008 Oct 4.
9
Encoding by response duration in the basal ganglia.基底神经节中基于反应持续时间的编码。
J Neurophysiol. 2008 Dec;100(6):3244-52. doi: 10.1152/jn.90400.2008. Epub 2008 Oct 8.
10
Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model.由多巴胺神经元活动和时间差异模型提出的消退三方机制。
J Neurosci. 2008 Sep 24;28(39):9619-31. doi: 10.1523/JNEUROSCI.0255-08.2008.