Department of Neurobiology and Ethology, Haifa University, Haifa, Israel.
Exp Brain Res. 2010 Jan;200(3-4):307-17. doi: 10.1007/s00221-009-2060-6. Epub 2009 Nov 11.
The reinforcement learning hypothesis of dopamine function predicts that dopamine acts as a teaching signal by governing synaptic plasticity in the striatum. Induced changes in synaptic strength enable the cortico-striatal network to learn a mapping between situations and actions that lead to a reward. A review of the relevant neurophysiology of dopamine function in the cortico-striatal network and the machine reinforcement learning hypothesis reveals an apparent mismatch with recent electrophysiological studies. It was found that in addition to the well-described reward-related responses, a subpopulation of dopamine neurons also exhibits phasic responses to aversive stimuli or to cues predicting aversive stimuli. Obviously, actions that lead to aversive events should not be reinforced. However, published data suggest that the phasic responses of dopamine neurons to reward-related stimuli have a higher firing rate and have a longer duration than phasic responses of dopamine neurons to aversion-related stimuli. We propose that based on different dopamine concentrations, the target structures are able to decode reward-related dopamine from aversion-related dopamine responses. Thereby, the learning of actions in the basal-ganglia network integrates information about both costs and benefits. This hypothesis predicts that dopamine concentration should be a crucial parameter for plasticity rules at cortico-striatal synapses. Recent in vitro studies on cortico-striatal synaptic plasticity rules support a striatal action-learning scheme where during reward-related dopamine release dopamine-dependent forms of synaptic plasticity occur, while during aversion-related dopamine release the dopamine concentration only allows dopamine-independent forms of synaptic plasticity to occur.
多巴胺功能的强化学习假说预测,多巴胺通过调节纹状体中的突触可塑性而起教学信号作用。突触强度的诱导变化使皮质纹状体网络能够学习到一种将情况与导致奖励的动作联系起来的映射。对皮质纹状体网络中多巴胺功能的相关神经生理学和机器强化学习假说的回顾显示,与最近的电生理学研究明显不匹配。研究发现,除了描述良好的与奖励相关的反应外,多巴胺神经元的亚群还对厌恶刺激或预测厌恶刺激的线索表现出相位反应。显然,导致厌恶事件的动作不应得到强化。然而,已发表的数据表明,多巴胺神经元对与奖励相关的刺激的相位反应的发射率比多巴胺神经元对与厌恶相关的刺激的相位反应的发射率更高,持续时间更长。我们提出,基于不同的多巴胺浓度,靶结构能够从与厌恶相关的多巴胺反应中解码与奖励相关的多巴胺。因此,基底神经节网络中的动作学习整合了关于成本和收益的信息。该假说预测,多巴胺浓度应该是皮质纹状体突触可塑性规则的关键参数。最近关于皮质纹状体突触可塑性规则的体外研究支持一种纹状体动作学习方案,即在与奖励相关的多巴胺释放期间发生多巴胺依赖性形式的突触可塑性,而在与厌恶相关的多巴胺释放期间,多巴胺浓度仅允许发生多巴胺非依赖性形式的突触可塑性。
Exp Brain Res. 2009-11-11
Prog Neurobiol. 2011-6-17
Prog Brain Res. 2000
Behav Brain Res. 2009-4-12
Front Neural Circuits. 2019-2-21
Brain Neurosci Adv. 2021-4-9
Cereb Cortex. 2018-11-1
Basic Clin Neurosci. 2016-10
Front Neural Circuits. 2014-8-5
Theor Med Bioeth. 2014-2
Proc Natl Acad Sci U S A. 2009-7-7
Proc Natl Acad Sci U S A. 2009-3-24
Neurotox Res. 2008-10