多巴胺、奖赏学习与主动推理

Dopamine, reward learning, and active inference.

作者信息

FitzGerald Thomas H B, Dolan Raymond J, Friston Karl

机构信息

The Wellcome Trust Centre for Neuroimaging, University College London London, UK ; Max Planck - UCL Centre for Computational Psychiatry and Ageing Research London, UK.

The Wellcome Trust Centre for Neuroimaging, University College London London, UK.

出版信息

Front Comput Neurosci. 2015 Nov 4;9:136. doi: 10.3389/fncom.2015.00136. eCollection 2015.

DOI:10.3389/fncom.2015.00136

PMID:26581305

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4631836/

Abstract

Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.

摘要

时间差分学习模型提出，阶段性多巴胺信号编码驱动学习的奖励预测误差。这一观点得到了一些研究的支持，在这些研究中，对多巴胺神经元的光遗传学刺激可以替代实际奖励。然而，大量数据也表明，多巴胺对于学习并非必不可少，多巴胺耗竭主要影响任务表现。基于多巴胺编码关于替代行动信念的精确性这一假设，我们为这一悖论提供了一种解决方案，从而控制行为的结果敏感性。我们扩展了一种用于解决马尔可夫决策过程的主动推理方案以纳入学习，并表明模拟的多巴胺动态与在工具性条件作用期间实际观察到的动态非常相似。此外，模拟的多巴胺耗竭会损害表现，但不会影响学习，而模拟的多巴胺神经元兴奋通过对结果状态的异常推理驱动奖励学习。我们的形式化方法为明显不同的实验结果提供了一种新颖且简洁的调和方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6676/4631836/a27a3c4e4ef8/fncom-09-00136-g0001.jpg

相似文献

Dopamine, reward learning, and active inference.

Front Comput Neurosci. 2015 Nov 4;9:136. doi: 10.3389/fncom.2015.00136. eCollection 2015.

Compromised NMDA/Glutamate Receptor Expression in Dopaminergic Neurons Impairs Instrumental Learning, But Not Pavlovian Goal Tracking or Sign Tracking.

eNeuro. 2015 Jun 10;2(3). doi: 10.1523/ENEURO.0040-14.2015. eCollection 2015 May-Jun.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Toward isolating the role of dopamine in the acquisition of incentive salience attribution.

Neuropharmacology. 2016 Oct;109:320-331. doi: 10.1016/j.neuropharm.2016.06.028. Epub 2016 Jun 28.

Dopamine signals as temporal difference errors: recent advances.

Curr Opin Neurobiol. 2021 Apr;67:95-105. doi: 10.1016/j.conb.2020.08.014. Epub 2020 Nov 10.

Predictive reward signal of dopamine neurons.

J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.

Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.

J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.

Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task.

J Neurosci. 1993 Mar;13(3):900-13. doi: 10.1523/JNEUROSCI.13-03-00900.1993.

Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors.

Nat Neurosci. 2016 Jan;19(1):111-6. doi: 10.1038/nn.4191. Epub 2015 Dec 7.

The debate over dopamine's role in reward: the case for incentive salience.

Psychopharmacology (Berl). 2007 Apr;191(3):391-431. doi: 10.1007/s00213-006-0578-x. Epub 2006 Oct 27.

引用本文的文献

Resilience phenotypes derived from an active inference account of allostasis.

Front Behav Neurosci. 2025 May 9;19:1524722. doi: 10.3389/fnbeh.2025.1524722. eCollection 2025.

Policy Complexity Suppresses Dopamine Responses.

J Neurosci. 2025 Feb 26;45(9):e1756242024. doi: 10.1523/JNEUROSCI.1756-24.2024.

Generalized cue reactivity in rat dopamine neurons after opioids.

Nat Commun. 2025 Jan 2;16(1):321. doi: 10.1038/s41467-024-55504-3.

Policy complexity suppresses dopamine responses.

bioRxiv. 2024 Sep 16:2024.09.15.613150. doi: 10.1101/2024.09.15.613150.

Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation.

Nat Neurosci. 2024 Nov;27(11):2178-2192. doi: 10.1038/s41593-024-01770-9. Epub 2024 Sep 27.

An Integrated theory of false insights and beliefs under psychedelics.

Commun Psychol. 2024 Aug 1;2(1):69. doi: 10.1038/s44271-024-00120-6.

Generalized cue reactivity in dopamine neurons after opioids.

bioRxiv. 2024 Jun 2:2024.06.02.597025. doi: 10.1101/2024.06.02.597025.

Sex-Specific Mechanisms Underlie Long-Term Potentiation at Hippocampus→Medium Spiny Neuron Synapses in the Medial Shell of the Nucleus Accumbens.

J Neurosci. 2024 Jul 3;44(27):e0100242024. doi: 10.1523/JNEUROSCI.0100-24.2024.

Gambling Environment Exposure Increases Temporal Discounting but Improves Model-Based Control in Regular Slot-Machine Gamblers.

Comput Psychiatr. 2022 Jul 5;6(1):142-165. doi: 10.5334/cpsy.84. eCollection 2022.

Dopamine regulates decision thresholds in human reinforcement learning in males.

Nat Commun. 2023 Sep 4;14(1):5369. doi: 10.1038/s41467-023-41130-y.

本文引用的文献

Active inference and epistemic value.

Cogn Neurosci. 2015;6(4):187-214. doi: 10.1080/17588928.2015.1020053. Epub 2015 Mar 13.

Optimal inference with suboptimal models: addiction and active Bayesian inference.

Med Hypotheses. 2015 Feb;84(2):109-17. doi: 10.1016/j.mehy.2014.12.007. Epub 2014 Dec 15.

Active inference, evidence accumulation, and the urn task.

Neural Comput. 2015 Feb;27(2):306-28. doi: 10.1162/NECO_a_00699. Epub 2014 Dec 16.

The anatomy of choice: dopamine and decision-making.

Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5;369(1655). doi: 10.1098/rstb.2013.0481.

Goal-directed learning and obsessive-compulsive disorder.

Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5;369(1655). doi: 10.1098/rstb.2013.0475.

Overriding phasic dopamine signals redirects action selection during risk/reward decision making.

Neuron. 2014 Oct 1;84(1):177-189. doi: 10.1016/j.neuron.2014.08.033. Epub 2014 Sep 11.

Inferring on the intentions of others by hierarchical Bayesian learning.

PLoS Comput Biol. 2014 Sep 4;10(9):e1003810. doi: 10.1371/journal.pcbi.1003810. eCollection 2014 Sep.

Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive.

Psychol Rev. 2014 Jul;121(3):337-66. doi: 10.1037/a0037015.

The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes.

Cereb Cortex. 2015 Oct;25(10):3434-45. doi: 10.1093/cercor/bhu159. Epub 2014 Jul 23.

Model averaging, optimal inference, and habit formation.

Front Hum Neurosci. 2014 Jun 26;8:457. doi: 10.3389/fnhum.2014.00457. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多巴胺、奖赏学习与主动推理

Dopamine, reward learning, and active inference.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献