Dalhousie University.
J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.
Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
我们做出决策的能力取决于我们对可用行动结果的了解。强化学习理论假设,跟随奖励或惩罚的行动通过计算预测误差(预测奖励与实际奖励之间的差异)来获得价值。大量神经影像学研究表明,奖励和惩罚会引起神经反应,这些反应似乎反映了强化学习预测误差[例如,Krigolson,O.E.,Pierce,L.J.,Holroyd,C.B.,和 Tanaka,J.W. 学习成为专家:强化学习和感知专业知识的获取。认知神经科学杂志,21,1833-1840,2009 年;Bayer,H.M.,和 Glimcher,P.W. 中脑多巴胺神经元编码定量奖励预测误差信号。神经元,47,129-141,2005 年;O'Doherty,J.P. 奖励表示和人类大脑中的奖励相关学习:神经影像学的见解。当代神经生物学观点,14,769-776,2004 年;Holroyd,C.B.,和 Coles,M.G.H. 人类错误处理的神经基础:强化学习、多巴胺和错误相关负波。心理评论,109,679-709,2002 年]。在这里,我们使用脑 ERP 技术证明,奖励不仅会引起类似于预测误差的神经反应,而且这种信号会随着学习而迅速减弱并传播到选择呈现的时间。具体来说,在一个简单的、可学习的赌博任务中,我们发现新的奖励会引起反馈错误相关负波,随着学习而迅速减小幅度。此外,我们在选择呈现时证明了奖励正波的存在,这是一个以前未报道过的 ERP 成分,其时间和拓扑结构与反馈错误相关负波相似,随着学习而增加幅度。我们观察到的结果模式反映了我们实施的计算模型的输出,该模型用于计算奖励预测误差以及在选择呈现和奖励交付时这些预测误差的幅度变化。我们的结果进一步支持了人类学习和决策所依据的计算遵循强化学习原则。