中脑多巴胺神经元在知觉决策中对选择准确性的置信度进行信号传递。

Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision.

机构信息

Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; UCL Institute of Ophthalmology, University College London, London EC1V 9EL, UK.

Brain Science Institute, Tamagawa University, Machida, Tokyo 194-8610, Japan; Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Av. de Brasilia, 1400-038 Lisbon, Portugal.

出版信息

Curr Biol. 2017 Mar 20;27(6):821-832. doi: 10.1016/j.cub.2017.02.026. Epub 2017 Mar 9.

DOI:10.1016/j.cub.2017.02.026

PMID:28285994

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5819757/

Abstract

Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was ("reward prediction error," RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model's predictions. Consequently, dopamine responses did not simply reflect a stimulus' average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys' choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions.

摘要

行为的组织核心是预测结果值以指导选择的能力。这种预测的准确性是通过一种指示预测错误程度的教学信号（奖励预测误差，RPE）来磨练的。在几种强化学习情境中，例如巴甫洛夫条件反射和基于奖励历史的决策，这种 RPE 信号是由中脑多巴胺神经元提供的。然而，在许多情况下，预测结果的刺激是感知上模糊的。感知不确定性已知会影响选择，但尚不清楚多巴胺神经元是否以及如何将其纳入其教学信号中。为了应对不确定性，我们扩展了一个强化学习模型，该模型具有关于感知上模糊刺激的信念状态；该模型生成选择正确性的概率估计，称为决策信心。我们表明，猴子在执行感知上模糊的决策任务时的多巴胺反应符合模型的预测。因此，多巴胺反应并不是简单地反映了刺激的平均预期奖励值，而是可以预测感知准确性的逐次波动。这些与信心相关的多巴胺反应出现在猴子选择开始之前，这表明多巴胺可能会影响即将做出的决策，而不仅仅是对决策后的教学信号进行编码。最后，通过操纵奖励大小，我们发现多巴胺神经元反映了即将到来的奖励大小及其实现的信心。总之，我们的研究结果表明，多巴胺反应传达的教学信号也适用于感知决策。

相似文献

Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision.中脑多巴胺神经元在知觉决策中对选择准确性的置信度进行信号传递。

Curr Biol. 2017 Mar 20;27(6):821-832. doi: 10.1016/j.cub.2017.02.026. Epub 2017 Mar 9.

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report.多巴胺奖赏预测误差信号编码了对感知决策报告的时间评估。

Proc Natl Acad Sci U S A. 2017 Nov 28;114(48):E10494-E10503. doi: 10.1073/pnas.1712479114. Epub 2017 Nov 13.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons.获得奖励的成本增强了中脑多巴胺神经元的奖励预测误差信号。

Nat Commun. 2019 Aug 15;10(1):3674. doi: 10.1038/s41467-019-11334-2.

Midbrain dopamine neurons encode a quantitative reward prediction error signal.中脑多巴胺神经元编码一种定量奖励预测误差信号。

Neuron. 2005 Jul 7;47(1):129-41. doi: 10.1016/j.neuron.2005.05.020.

J Neurosci. 2003 Oct 29;23(30):9913-23. doi: 10.1523/JNEUROSCI.23-30-09913.2003.

Transient activation of midbrain dopamine neurons by reward risk.奖励风险对中脑多巴胺神经元的瞬时激活。

Neuroscience. 2011 Dec 1;197:162-71. doi: 10.1016/j.neuroscience.2011.09.037. Epub 2011 Sep 22.

Dopamine neurons code subjective sensory experience and uncertainty of perceptual decisions.多巴胺神经元编码主观感觉体验和知觉决策的不确定性。

Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):19767-71. doi: 10.1073/pnas.1117636108. Epub 2011 Nov 21.

Midbrain dopamine neurons encode decisions for future action.中脑多巴胺神经元对未来行动的决策进行编码。

Nat Neurosci. 2006 Aug;9(8):1057-63. doi: 10.1038/nn1743. Epub 2006 Jul 23.

Adolescent Dopamine Neurons Represent Reward Differently during Action and State Guided Learning.青少年多巴胺神经元在动作和状态引导学习中对奖励的表现不同。

J Neurosci. 2021 Nov 10;41(45):9419-9430. doi: 10.1523/JNEUROSCI.1321-21.2021. Epub 2021 Oct 5.

引用本文的文献

Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning.伏隔核多巴胺释放反映了工具性学习过程中的贝叶斯推理。

PLoS Comput Biol. 2025 Jul 2;21(7):e1013226. doi: 10.1371/journal.pcbi.1013226. eCollection 2025 Jul.

What dopamine teaches depends on what the brain believes.多巴胺所传达的信息取决于大脑所相信的内容。

Nat Neurosci. 2025 May 28. doi: 10.1038/s41593-025-01980-9.

Stimulus uncertainty and relative reward rates determine adaptive responding in perceptual decision-making.刺激不确定性和相对奖励率决定了知觉决策中的适应性反应。

PLoS Comput Biol. 2025 May 27;21(5):e1012636. doi: 10.1371/journal.pcbi.1012636. eCollection 2025 May.

Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Striatal dopamine represents valence on dynamic regional scales.纹状体多巴胺在动态区域尺度上代表效价。

J Neurosci. 2025 Mar 17;45(17). doi: 10.1523/JNEUROSCI.1551-24.2025.

Understanding learning through uncertainty and bias.通过不确定性和偏差来理解学习。

Commun Psychol. 2025 Feb 13;3(1):24. doi: 10.1038/s44271-025-00203-y.

Contextual cues facilitate dynamic value encoding in the mesolimbic dopamine system.情境线索有助于中脑边缘多巴胺系统中的动态价值编码。

Curr Biol. 2025 Feb 24;35(4):746-760.e5. doi: 10.1016/j.cub.2024.12.031. Epub 2025 Jan 23.

The moderating role of COMT gene rs4680 polymorphism between maladaptive metacognitive beliefs and negative symptoms in patients with schizophrenia.COMT 基因 rs4680 多态性在精神分裂症患者适应不良元认知信念与阴性症状之间的调节作用。

BMC Psychiatry. 2024 Nov 20;24(1):831. doi: 10.1186/s12888-024-06275-0.

Neuronal activity in the ventral tegmental area during goal-directed navigation recorded by low-curvature microelectrode arrays.利用低曲率微电极阵列记录目标导向导航过程中腹侧被盖区的神经元活动。

Microsyst Nanoeng. 2024 Oct 14;10(1):145. doi: 10.1038/s41378-024-00778-2.

Dopamine neurons encode trial-by-trial subjective reward value in an auction-like task.在类似拍卖的任务中，多巴胺神经元对每次试验的主观奖励值进行编码。

Nat Commun. 2024 Sep 17;15(1):8138. doi: 10.1038/s41467-024-52311-8.

本文引用的文献

Dopamine neurons learn relative chosen value from probabilistic rewards.多巴胺神经元从概率性奖励中学习相对选择价值。

Elife. 2016 Oct 27;5:e18044. doi: 10.7554/eLife.18044.

A Mathematical Framework for Statistical Decision Confidence.统计决策置信度的数学框架。

Neural Comput. 2016 Sep;28(9):1840-58. doi: 10.1162/NECO_a_00864. Epub 2016 Jul 8.

Signatures of a Statistical Computation in the Human Sense of Confidence.人类置信感中统计计算的特征

Neuron. 2016 May 4;90(3):499-506. doi: 10.1016/j.neuron.2016.03.025.

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework.中脑多巴胺神经元在一个通用框架中计算推断和缓存的价值预测误差。

Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.

Confidence and certainty: distinct probabilistic quantities for different goals.置信度与确定性：针对不同目标的不同概率量值。

Nat Neurosci. 2016 Mar;19(3):366-74. doi: 10.1038/nn.4240.

Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens.夯实与增强：伏隔核中阶段性多巴胺信号的算法、实现及计算解释

PLoS Comput Biol. 2015 Dec 23;11(12):e1004622. doi: 10.1371/journal.pcbi.1004622. eCollection 2015 Dec.

Mesolimbic dopamine signals the value of work.中脑边缘多巴胺传递工作的价值。

Nat Neurosci. 2016 Jan;19(1):117-26. doi: 10.1038/nn.4173. Epub 2015 Nov 23.

Central Cholinergic Neurons Are Rapidly Recruited by Reinforcement Feedback.强化反馈可快速激活中枢胆碱能神经元。

Cell. 2015 Aug 27;162(5):1155-68. doi: 10.1016/j.cell.2015.07.057.

Serotonergic neurons signal reward and punishment on multiple timescales.5-羟色胺能神经元在多个时间尺度上传递奖励和惩罚信号。

Elife. 2015 Feb 25;4:e06346. doi: 10.7554/eLife.06346.

Dopamine reward prediction error responses reflect marginal utility.多巴胺奖励预测误差反应反映了边际效用。

Curr Biol. 2014 Nov 3;24(21):2491-500. doi: 10.1016/j.cub.2014.08.064. Epub 2014 Oct 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验