多巴胺神经元学会编码多个未来奖励的长期价值。

Dopamine neurons learn to encode the long-term value of multiple future rewards.

机构信息

Department of Physiology, Kyoto Prefectural University of Medicine, Kyoto 602-8566, Japan.

出版信息

Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):15462-7. doi: 10.1073/pnas.1014457108. Epub 2011 Sep 6.

DOI:10.1073/pnas.1014457108

PMID:21896766

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3174584/

Abstract

Midbrain dopamine neurons signal reward value, their prediction error, and the salience of events. If they play a critical role in achieving specific distant goals, long-term future rewards should also be encoded as suggested in reinforcement learning theories. Here, we address this experimentally untested issue. We recorded 185 dopamine neurons in three monkeys that performed a multistep choice task in which they explored a reward target among alternatives and then exploited that knowledge to receive one or two additional rewards by choosing the same target in a set of subsequent trials. An analysis of anticipatory licking for reward water indicated that the monkeys did not anticipate an immediately expected reward in individual trials; rather, they anticipated the sum of immediate and multiple future rewards. In accordance with this behavioral observation, the dopamine responses to the start cues and reinforcer beeps reflected the expected values of the multiple future rewards and their errors, respectively. More specifically, when monkeys learned the multistep choice task over the course of several weeks, the responses of dopamine neurons encoded the sum of the immediate and expected multiple future rewards. The dopamine responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. These findings demonstrate that dopamine neurons learn to encode the long-term value of multiple future rewards with distant rewards discounted.

摘要

中脑多巴胺神经元信号传递奖励价值、预测误差和事件的显著程度。如果它们在实现特定的长远目标中起着关键作用，正如强化学习理论所建议的那样，那么长期的未来奖励也应该被编码。在这里，我们解决了这个实验尚未验证的问题。我们在三只猴子中记录了 185 个多巴胺神经元，它们在一个多步选择任务中表现出色，在该任务中，它们在替代方案中探索奖励目标，然后通过在后续的一系列试验中选择相同的目标来利用这些知识获得一个或两个额外的奖励。对奖励水的预期舔舐的分析表明，猴子在单个试验中并没有预期立即得到奖励；相反，它们预期的是即时和多个未来奖励的总和。与这一行为观察一致，多巴胺对起始线索和强化器哔哔声的反应分别反映了多个未来奖励的预期价值及其误差。更具体地说，当猴子在数周的时间里学习多步选择任务时，多巴胺神经元的反应编码了即时奖励和预期的多个未来奖励的总和。多巴胺反应可以通过强化学习中具有时间折扣的价值函数的理论描述进行定量预测。这些发现表明，多巴胺神经元学会用遥远的奖励折扣来编码多个未来奖励的长期价值。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

多巴胺神经元学会编码多个未来奖励的长期价值。

Dopamine neurons learn to encode the long-term value of multiple future rewards.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

多巴胺神经元学会编码多个未来奖励的长期价值。

Dopamine neurons learn to encode the long-term value of multiple future rewards.

机构信息

出版信息