Suppr超能文献

多巴胺神经元学会编码多个未来奖励的长期价值。

Dopamine neurons learn to encode the long-term value of multiple future rewards.

机构信息

Department of Physiology, Kyoto Prefectural University of Medicine, Kyoto 602-8566, Japan.

出版信息

Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):15462-7. doi: 10.1073/pnas.1014457108. Epub 2011 Sep 6.

Abstract

Midbrain dopamine neurons signal reward value, their prediction error, and the salience of events. If they play a critical role in achieving specific distant goals, long-term future rewards should also be encoded as suggested in reinforcement learning theories. Here, we address this experimentally untested issue. We recorded 185 dopamine neurons in three monkeys that performed a multistep choice task in which they explored a reward target among alternatives and then exploited that knowledge to receive one or two additional rewards by choosing the same target in a set of subsequent trials. An analysis of anticipatory licking for reward water indicated that the monkeys did not anticipate an immediately expected reward in individual trials; rather, they anticipated the sum of immediate and multiple future rewards. In accordance with this behavioral observation, the dopamine responses to the start cues and reinforcer beeps reflected the expected values of the multiple future rewards and their errors, respectively. More specifically, when monkeys learned the multistep choice task over the course of several weeks, the responses of dopamine neurons encoded the sum of the immediate and expected multiple future rewards. The dopamine responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. These findings demonstrate that dopamine neurons learn to encode the long-term value of multiple future rewards with distant rewards discounted.

摘要

中脑多巴胺神经元信号传递奖励价值、预测误差和事件的显著程度。如果它们在实现特定的长远目标中起着关键作用,正如强化学习理论所建议的那样,那么长期的未来奖励也应该被编码。在这里,我们解决了这个实验尚未验证的问题。我们在三只猴子中记录了 185 个多巴胺神经元,它们在一个多步选择任务中表现出色,在该任务中,它们在替代方案中探索奖励目标,然后通过在后续的一系列试验中选择相同的目标来利用这些知识获得一个或两个额外的奖励。对奖励水的预期舔舐的分析表明,猴子在单个试验中并没有预期立即得到奖励;相反,它们预期的是即时和多个未来奖励的总和。与这一行为观察一致,多巴胺对起始线索和强化器哔哔声的反应分别反映了多个未来奖励的预期价值及其误差。更具体地说,当猴子在数周的时间里学习多步选择任务时,多巴胺神经元的反应编码了即时奖励和预期的多个未来奖励的总和。多巴胺反应可以通过强化学习中具有时间折扣的价值函数的理论描述进行定量预测。这些发现表明,多巴胺神经元学会用遥远的奖励折扣来编码多个未来奖励的长期价值。

相似文献

1
Dopamine neurons learn to encode the long-term value of multiple future rewards.多巴胺神经元学会编码多个未来奖励的长期价值。
Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):15462-7. doi: 10.1073/pnas.1014457108. Epub 2011 Sep 6.
4
Predictive reward signal of dopamine neurons.多巴胺神经元的预测性奖励信号。
J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.

引用本文的文献

4
A dopamine mechanism for reward maximization.多巴胺奖赏最大化机制。
Proc Natl Acad Sci U S A. 2024 May 14;121(20):e2316658121. doi: 10.1073/pnas.2316658121. Epub 2024 May 8.
6
Predictions about reward outcomes in rhesus monkeys.预测猕猴的奖励结果。
Behav Neurosci. 2024 Feb;138(1):43-58. doi: 10.1037/bne0000573. Epub 2023 Dec 7.

本文引用的文献

1
Dopamine, time, and impulsivity in humans.人类的多巴胺、时间和冲动。
J Neurosci. 2010 Jun 30;30(26):8888-96. doi: 10.1523/JNEUROSCI.6028-09.2010.
4
Hyperbolically discounted temporal difference learning.超贴现时间差分学习。
Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.
7
Influence of reward delays on responses of dopamine neurons.奖励延迟对多巴胺神经元反应的影响。
J Neurosci. 2008 Jul 30;28(31):7837-46. doi: 10.1523/JNEUROSCI.1600-08.2008.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验