Suppr超能文献

苍白球缰核对多巴胺通路信号推断刺激值。

A pallidus-habenula-dopamine pathway signals inferred stimulus values.

机构信息

Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bldg. 49, Rm. 2A50, Bethesda, Maryland 20892-4435, USA.

出版信息

J Neurophysiol. 2010 Aug;104(2):1068-76. doi: 10.1152/jn.00158.2010. Epub 2010 Jun 10.

Abstract

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.

摘要

刺激的奖励价值可以通过两种不同的机制来学习

通过重复的刺激-奖励配对进行强化学习,以及基于手头任务的知识进行抽象推理。强化机制通常与中脑多巴胺神经元有关。在这里,我们表明,控制多巴胺系统的神经通路并不完全依赖于刺激-奖励配对或抽象推理,而是结合了两者。我们训练猴子执行一项奖励偏向性的扫视任务,其中两个扫视目标的奖励值以系统的方式相关。动物利用每次试验的奖励结果来学习两个目标的价值:呈现的目标及其奖励结果(体验价值)和未呈现的目标,但可以根据任务的奖励统计信息推断其价值(推断价值)。然后,我们从三种奖励编码神经元群体中进行记录:黑质多巴胺神经元;多巴胺神经元的一个主要输入,外侧缰核;以及投射到外侧缰核的神经元,位于苍白球中。这三个群体都编码了体验价值和推断价值。在一些动物中,神经元对体验价值的编码比推断价值更强,并且动物在从经验学习中表现出比从推理中更快的行为证据。我们的数据表明,苍白球-缰核-多巴胺通路通过经验和推理来估计奖励价值。

相似文献

1
A pallidus-habenula-dopamine pathway signals inferred stimulus values.苍白球缰核对多巴胺通路信号推断刺激值。
J Neurophysiol. 2010 Aug;104(2):1068-76. doi: 10.1152/jn.00158.2010. Epub 2010 Jun 10.
9
A hypothalamus-habenula circuit controls aversion.下丘脑 - 缰核回路控制厌恶感。
Mol Psychiatry. 2019 Sep;24(9):1351-1368. doi: 10.1038/s41380-019-0369-5. Epub 2019 Feb 12.

引用本文的文献

6
A dopamine mechanism for reward maximization.多巴胺奖赏最大化机制。
Proc Natl Acad Sci U S A. 2024 May 14;121(20):e2316658121. doi: 10.1073/pnas.2316658121. Epub 2024 May 8.

本文引用的文献

5
Reinforcement learning: the good, the bad and the ugly.强化学习:优点、缺点与不足。
Curr Opin Neurobiol. 2008 Apr;18(2):185-96. doi: 10.1016/j.conb.2008.08.003. Epub 2008 Aug 22.
6
Task set and prefrontal cortex.任务设定与前额叶皮层。
Annu Rev Neurosci. 2008;31:219-45. doi: 10.1146/annurev.neuro.31.060407.125642.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验