概率性奖励消退中的价值学习与唤醒：多巴胺在修正时间差分模型中的作用

Value learning and arousal in the extinction of probabilistic rewards: the role of dopamine in a modified temporal difference model.

作者信息

Song Minryung R, Fellous Jean-Marc

机构信息

Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea.

Graduate Interdisciplinary Program in Neuroscience, University of Arizona, Tucson, Arizona, United States of America ; Department of Psychology, University of Arizona, Tucson, Arizona, United States of America ; Department of Applied Mathematics, University of Arizona, Tucson, Arizona, United States of America.

出版信息

PLoS One. 2014 Feb 26;9(2):e89494. doi: 10.1371/journal.pone.0089494. eCollection 2014.

DOI:10.1371/journal.pone.0089494

PMID:24586823

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3935866/

Abstract

Because most rewarding events are probabilistic and changing, the extinction of probabilistic rewards is important for survival. It has been proposed that the extinction of probabilistic rewards depends on arousal and the amount of learning of reward values. Midbrain dopamine neurons were suggested to play a role in both arousal and learning reward values. Despite extensive research on modeling dopaminergic activity in reward learning (e.g. temporal difference models), few studies have been done on modeling its role in arousal. Although temporal difference models capture key characteristics of dopaminergic activity during the extinction of deterministic rewards, they have been less successful at simulating the extinction of probabilistic rewards. By adding an arousal signal to a temporal difference model, we were able to simulate the extinction of probabilistic rewards and its dependence on the amount of learning. Our simulations propose that arousal allows the probability of reward to have lasting effects on the updating of reward value, which slows the extinction of low probability rewards. Using this model, we predicted that, by signaling the prediction error, dopamine determines the learned reward value that has to be extinguished during extinction and participates in regulating the size of the arousal signal that controls the learning rate. These predictions were supported by pharmacological experiments in rats.

摘要

由于大多数有奖励的事件都是概率性的且不断变化的，概率性奖励的消退对生存至关重要。有人提出，概率性奖励的消退取决于唤醒以及奖励值的学习量。中脑多巴胺神经元被认为在唤醒和学习奖励值方面都发挥作用。尽管在奖励学习中对多巴胺能活动建模进行了广泛研究（例如时间差分模型），但对其在唤醒中作用的建模研究却很少。虽然时间差分模型捕捉了确定性奖励消退期间多巴胺能活动的关键特征，但它们在模拟概率性奖励的消退方面不太成功。通过在时间差分模型中添加一个唤醒信号，我们能够模拟概率性奖励的消退及其对学习量的依赖性。我们的模拟表明，唤醒使得奖励概率对奖励值更新产生持久影响，这减缓了低概率奖励的消退。使用这个模型，我们预测，通过发出预测误差信号，多巴胺决定了在消退过程中必须被消除的习得奖励值，并参与调节控制学习率的唤醒信号的大小。这些预测得到了大鼠药理学实验的支持。

相似文献

Value learning and arousal in the extinction of probabilistic rewards: the role of dopamine in a modified temporal difference model.

PLoS One. 2014 Feb 26;9(2):e89494. doi: 10.1371/journal.pone.0089494. eCollection 2014.

Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features.

Curr Biol. 2017 Nov 20;27(22):3480-3486.e3. doi: 10.1016/j.cub.2017.09.049. Epub 2017 Nov 2.

A causal link between prediction errors, dopamine neurons and learning.

Nat Neurosci. 2013 Jul;16(7):966-73. doi: 10.1038/nn.3413. Epub 2013 May 26.

Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards.

Neuron. 2017 Sep 13;95(6):1395-1405.e3. doi: 10.1016/j.neuron.2017.08.025.

Dopamine neurons report an error in the temporal prediction of reward during learning.

Nat Neurosci. 1998 Aug;1(4):304-9. doi: 10.1038/1124.

A distributional code for value in dopamine-based reinforcement learning.

Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner.

Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.

Predictive reward signal of dopamine neurons.

J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.

Can the apparent adaptation of dopamine neurons' mismatch sensitivities be reconciled with their computation of reward prediction errors?

Neurosci Lett. 2008 Jun 13;438(1):14-6. doi: 10.1016/j.neulet.2008.04.059. Epub 2008 Apr 22.

引用本文的文献

Music Listening as Exploratory Behavior: From Dispositional Reactions to Epistemic Interactions with the Sonic World.

Behav Sci (Basel). 2024 Sep 16;14(9):825. doi: 10.3390/bs14090825.

Music Listening and Homeostatic Regulation: Surviving and Flourishing in a Sonic World.

Int J Environ Res Public Health. 2021 Dec 27;19(1):278. doi: 10.3390/ijerph19010278.

Investigating dopamine and glucocorticoid systems as underlying mechanisms of anhedonia.

Psychopharmacology (Berl). 2018 Nov;235(11):3103-3113. doi: 10.1007/s00213-018-5007-4. Epub 2018 Aug 22.

The Brainstem in Emotion: A Review.

Front Neuroanat. 2017 Mar 9;11:15. doi: 10.3389/fnana.2017.00015. eCollection 2017.

Uncertainty-Dependent Extinction of Fear Memory in an Amygdala-mPFC Neural Circuit Model.

PLoS Comput Biol. 2016 Sep 12;12(9):e1005099. doi: 10.1371/journal.pcbi.1005099. eCollection 2016 Sep.

The effects of methylphenidate on cerebral responses to conflict anticipation and unsigned prediction error in a stop-signal task.

J Psychopharmacol. 2016 Mar;30(3):283-93. doi: 10.1177/0269881115625102. Epub 2016 Jan 11.

本文引用的文献

A causal link between prediction errors, dopamine neurons and learning.

Nat Neurosci. 2013 Jul;16(7):966-73. doi: 10.1038/nn.3413. Epub 2013 May 26.

Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain.

Nat Neurosci. 2013 Jan;16(1):71-8. doi: 10.1038/nn.3283. Epub 2012 Dec 9.

Biol Psychiatry. 2012 Dec 15;72(12):1012-9. doi: 10.1016/j.biopsych.2012.05.023. Epub 2012 Jul 3.

Rational regulation of learning dynamics by pupil-linked arousal systems.

Nat Neurosci. 2012 Jun 3;15(7):1040-6. doi: 10.1038/nn.3130.

High-frequency gamblers show increased resistance to extinction following partial reinforcement.

Behav Brain Res. 2012 Apr 15;229(2):438-42. doi: 10.1016/j.bbr.2012.01.024. Epub 2012 Jan 20.

Differential roles of human striatum and amygdala in associative learning.

Nat Neurosci. 2011 Sep 11;14(10):1250-2. doi: 10.1038/nn.2904.

Representations of uncertainty in sensorimotor control.

Curr Opin Neurobiol. 2011 Aug;21(4):629-35. doi: 10.1016/j.conb.2011.05.026.

Reconciling the influence of predictiveness and uncertainty on stimulus salience: a model of attention in associative learning.

Proc Biol Sci. 2011 Sep 7;278(1718):2553-61. doi: 10.1098/rspb.2011.0836. Epub 2011 Jun 8.

Dopaminergic genes predict individual differences in susceptibility to confirmation bias.

J Neurosci. 2011 Apr 20;31(16):6188-98. doi: 10.1523/JNEUROSCI.6486-10.2011.

How instructed knowledge modulates the neural systems of reward learning.

Proc Natl Acad Sci U S A. 2011 Jan 4;108(1):55-60. doi: 10.1073/pnas.1014938108. Epub 2010 Dec 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

概率性奖励消退中的价值学习与唤醒：多巴胺在修正时间差分模型中的作用

Value learning and arousal in the extinction of probabilistic rewards: the role of dopamine in a modified temporal difference model.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献