TNO Defence, Security and Safety, PO Box 23, 3769 ZG Soesterberg, The Netherlands.
Neuropsychologia. 2012 Apr;50(5):583-91. doi: 10.1016/j.neuropsychologia.2011.12.012. Epub 2011 Dec 27.
Learning to select optimal behavior in new and uncertain situations is a crucial aspect of living and requires the ability to quickly associate stimuli with actions that lead to rewarding outcomes. Mathematical models of reinforcement-based learning to select rewarding actions distinguish between (1) the formation of stimulus-action-reward associations, such that, at the instant a specific stimulus is presented, it activates a specific action, based on the expectation that that particular action will likely incur reward (or avoid punishment); and (2) the comparison of predicted and actual outcomes to determine whether the specific stimulus-action association yielded the intended outcome or needs revision. Animal electrophysiology and human fMRI studies converge on the notion that dissociable neural circuitries centered on the striatum are differentially involved in different components of this learning process. The modulatory role of dopamine (DA) in these respective circuits and component processes is of particular relevance to the study of reward-based learning in patients diagnosed with Parkinson's disease (PD). Here we show that the first component process, learning to predict which actions yield reward (supported by the anterior putamen and associated motor circuitry) is impaired when PD patients are taken off their DA medication, whereas DA medication has no systematic effects on the second processes, outcome evaluation (supported by caudate and ventral striatum and associated frontal circuitries). However, the effects of DA medication on these processes depend on dosage, with larger daily doses leading to a decrease in predictability of stimulus-action-reward relations and increase in reward-prediction errors.
学习在新的和不确定的情况下选择最佳行为是生活的一个关键方面,需要能够快速将刺激与导致奖励结果的行动联系起来。基于强化学习的选择奖励行为的数学模型区分了以下两种情况:(1)刺激-动作-奖励关联的形成,即当特定刺激呈现时,它会根据特定动作可能导致奖励(或避免惩罚)的预期激活特定动作;(2)预测结果与实际结果的比较,以确定特定的刺激-动作关联是否产生了预期的结果或需要修改。动物电生理学和人类 fMRI 研究都集中在一个概念上,即围绕纹状体的可分离神经回路在这个学习过程的不同组成部分中有着不同的作用。多巴胺(DA)在这些相应回路和组成过程中的调制作用对于研究诊断为帕金森病(PD)的患者的基于奖励的学习特别重要。在这里,我们表明,第一个组成过程,即学习预测哪些动作会产生奖励(由前壳核和相关的运动回路支持),当 PD 患者停止服用 DA 药物时会受到损害,而 DA 药物对第二个过程,即结果评估(由尾状核和腹侧纹状体以及相关的额叶回路支持)没有系统的影响。然而,DA 药物对这些过程的影响取决于剂量,较大的日剂量会导致刺激-动作-奖励关系的可预测性降低和奖励预测错误增加。