• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多巴胺奖励预测误差反映了跨时间的隐藏状态推理。

Dopamine reward prediction errors reflect hidden-state inference across time.

作者信息

Starkweather Clara Kwon, Babayan Benedicte M, Uchida Naoshige, Gershman Samuel J

机构信息

Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA.

Center for Brain Science, Department of Psychology, Harvard University, Cambridge, Massachusetts, USA.

出版信息

Nat Neurosci. 2017 Apr;20(4):581-589. doi: 10.1038/nn.4520. Epub 2017 Mar 6.

DOI:10.1038/nn.4520
PMID:28263301
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374025/
Abstract

Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.

摘要

中脑多巴胺神经元发出奖励预测误差(RPE)信号,即实际奖励减去预期奖励。时间差(TD)学习模型一直是理解多巴胺RPE如何驱动联想学习的基石。传统上,TD学习赋予与可观察刺激相关的、按顺序跟踪经过时间的特征以价值。然而,在现实世界中,感觉刺激提供了关于环境隐藏状态的模糊信息,这导致有人提出TD学习可能反而基于隐藏状态的推断分布(“信念状态”)来计算价值信号。在这里,我们研究了多巴胺能信号是否支持在隐藏状态上运行的TD学习框架。我们发现,多巴胺信号在两个仅在奖励是否以确定性方式发放方面有所不同的任务之间表现出显著差异。我们的结果支持一种将缓存值与隐藏状态推断相结合的联想学习规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2c7e82c51305/nihms848559f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/258fbb23ab99/nihms848559f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/20634a98ee3b/nihms848559f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/803e1a2d3bf2/nihms848559f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2b8797d18ccb/nihms848559f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2e88e3a124ef/nihms848559f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/6994c9e26477/nihms848559f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2c7e82c51305/nihms848559f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/258fbb23ab99/nihms848559f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/20634a98ee3b/nihms848559f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/803e1a2d3bf2/nihms848559f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2b8797d18ccb/nihms848559f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2e88e3a124ef/nihms848559f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/6994c9e26477/nihms848559f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/249d/5374025/2c7e82c51305/nihms848559f7.jpg

相似文献

1
Dopamine reward prediction errors reflect hidden-state inference across time.多巴胺奖励预测误差反映了跨时间的隐藏状态推理。
Nat Neurosci. 2017 Apr;20(4):581-589. doi: 10.1038/nn.4520. Epub 2017 Mar 6.
2
Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features.光遗传学阻断多巴胺瞬变可防止因奖励特征变化引起的学习。
Curr Biol. 2017 Nov 20;27(22):3480-3486.e3. doi: 10.1016/j.cub.2017.09.049. Epub 2017 Nov 2.
3
Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework.中脑多巴胺神经元在一个通用框架中计算推断和缓存的价值预测误差。
Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.
4
Cue and Reward Evoked Dopamine Activity Is Necessary for Maintaining Learned Pavlovian Associations.线索和奖励诱发的多巴胺活动对于维持习得的巴甫洛夫式联想是必要的。
J Neurosci. 2021 Jun 9;41(23):5004-5014. doi: 10.1523/JNEUROSCI.2744-20.2021. Epub 2021 Apr 22.
5
Dissociable contributions of phasic dopamine activity to reward and prediction.相位多巴胺活动对奖励和预测的可分离贡献。
Cell Rep. 2021 Sep 7;36(10):109684. doi: 10.1016/j.celrep.2021.109684.
6
Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards.多巴胺神经元对预期奖励的感觉特征预测错误做出反应。
Neuron. 2017 Sep 13;95(6):1395-1405.e3. doi: 10.1016/j.neuron.2017.08.025.
7
A causal link between prediction errors, dopamine neurons and learning.预测误差、多巴胺神经元和学习之间的因果关系。
Nat Neurosci. 2013 Jul;16(7):966-73. doi: 10.1038/nn.3413. Epub 2013 May 26.
8
Dopamine transients are sufficient and necessary for acquisition of model-based associations.多巴胺瞬变对于基于模型的联想学习而言既是充分的也是必要的。
Nat Neurosci. 2017 May;20(5):735-742. doi: 10.1038/nn.4538. Epub 2017 Apr 3.
9
Cue-Evoked Dopamine Promotes Conditioned Responding during Learning.线索诱发的多巴胺促进学习过程中的条件反应。
Neuron. 2020 Apr 8;106(1):142-153.e7. doi: 10.1016/j.neuron.2020.01.012. Epub 2020 Feb 5.
10
Decreases in Cued Reward Seeking After Reward-Paired Inhibition of Mesolimbic Dopamine.中脑边缘多巴胺奖赏系统对奖赏的抑制作用会导致 cue 诱导的奖赏寻求减少。
Neuroscience. 2019 Aug 1;412:259-269. doi: 10.1016/j.neuroscience.2019.04.035. Epub 2019 Apr 25.

引用本文的文献

1
Individual differences in decision-making shape how mesolimbic dopamine regulates choice confidence and change-of-mind.决策过程中的个体差异塑造了中脑边缘多巴胺调节选择信心和改变想法的方式。
Nat Neurosci. 2025 Jul 30. doi: 10.1038/s41593-025-02015-z.
2
Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning.伏隔核多巴胺释放反映了工具性学习过程中的贝叶斯推理。
PLoS Comput Biol. 2025 Jul 2;21(7):e1013226. doi: 10.1371/journal.pcbi.1013226. eCollection 2025 Jul.
3
What dopamine teaches depends on what the brain believes.

本文引用的文献

1
Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum.大鼠腹侧被盖区中假定多巴胺能神经元发出的奖励预测误差的时间特异性取决于腹侧纹状体。
Neuron. 2016 Jul 6;91(1):182-93. doi: 10.1016/j.neuron.2016.05.015. Epub 2016 Jun 9.
2
Cholinergic Interneurons Use Orbitofrontal Input to Track Beliefs about Current State.胆碱能中间神经元利用眶额输入来追踪关于当前状态的信念。
J Neurosci. 2016 Jun 8;36(23):6242-57. doi: 10.1523/JNEUROSCI.0157-16.2016.
3
Dopamine neurons share common response function for reward prediction error.
多巴胺所传达的信息取决于大脑所相信的内容。
Nat Neurosci. 2025 May 28. doi: 10.1038/s41593-025-01980-9.
4
Spurious autobiographical memories of psychosis: a dopamine-gated neuroplasticity account for relapse and treatment-resistant psychosis.精神分裂症的虚假自传体记忆:多巴胺门控神经可塑性对复发和难治性精神分裂症的解释
Psychol Med. 2025 Apr 7;55:e14. doi: 10.1017/S0033291724003027.
5
A corticostriatal learning mechanism linking excess striatal dopamine and auditory hallucinations.一种将纹状体多巴胺过量与幻听相联系的皮质纹状体学习机制。
bioRxiv. 2025 Mar 18:2025.03.18.643990. doi: 10.1101/2025.03.18.643990.
6
Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。
Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.
7
Neural signatures of temporal anticipation in human cortex represent event probability density.人类大脑皮层中时间预期的神经特征代表事件概率密度。
Nat Commun. 2025 Mar 16;16(1):2602. doi: 10.1038/s41467-025-57813-7.
8
Contextual cues facilitate dynamic value encoding in the mesolimbic dopamine system.情境线索有助于中脑边缘多巴胺系统中的动态价值编码。
Curr Biol. 2025 Feb 24;35(4):746-760.e5. doi: 10.1016/j.cub.2024.12.031. Epub 2025 Jan 23.
9
Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations.背侧纹状体中的多巴胺释放平台和结果信号与经典的强化学习公式形成对比。
Nat Commun. 2024 Oct 14;15(1):8856. doi: 10.1038/s41467-024-53176-7.
10
Dopamine transients encode reward prediction errors independent of learning rates.多巴胺瞬变独立于学习率编码奖励预测误差。
Cell Rep. 2024 Oct 22;43(10):114840. doi: 10.1016/j.celrep.2024.114840. Epub 2024 Oct 11.
多巴胺能神经元对奖励预测误差具有共同的反应功能。
Nat Neurosci. 2016 Mar;19(3):479-86. doi: 10.1038/nn.4239. Epub 2016 Feb 8.
4
Mesolimbic dopamine signals the value of work.中脑边缘多巴胺传递工作的价值。
Nat Neurosci. 2016 Jan;19(1):117-26. doi: 10.1038/nn.4173. Epub 2015 Nov 23.
5
Prefrontal Regulation of Neuronal Activity in the Ventral Tegmental Area.腹侧被盖区神经元活动的前额叶调节
Cereb Cortex. 2016 Oct;26(10):4057-4068. doi: 10.1093/cercor/bhv215. Epub 2015 Sep 22.
6
Habenula Lesions Reveal that Multiple Mechanisms Underlie Dopamine Prediction Errors.缰核损伤表明多巴胺预测误差存在多种机制。
Neuron. 2015 Sep 23;87(6):1304-1316. doi: 10.1016/j.neuron.2015.08.028. Epub 2015 Sep 10.
7
Arithmetic and local circuitry underlying dopamine prediction errors.多巴胺预测误差背后的算术和局部神经回路。
Nature. 2015 Sep 10;525(7568):243-6. doi: 10.1038/nature14855. Epub 2015 Aug 31.
8
A scalable population code for time in the striatum.纹状体中的时间的可扩展群体代码。
Curr Biol. 2015 May 4;25(9):1113-22. doi: 10.1016/j.cub.2015.02.036. Epub 2015 Apr 23.
9
Dopamine neurons encode errors in predicting movement trigger occurrence.多巴胺神经元在预测运动触发事件发生时编码误差。
J Neurophysiol. 2015 Feb 15;113(4):1110-23. doi: 10.1152/jn.00401.2014. Epub 2014 Nov 19.
10
Time representation in reinforcement learning models of the basal ganglia.基底神经节强化学习模型中的时间表示。
Front Comput Neurosci. 2014 Jan 9;7:194. doi: 10.3389/fncom.2013.00194. eCollection 2014.