Sobell Department of Motor Neuroscience and Movement Disorders, Institute of Neurology, UCL, London WC1N3BG, UK.
Neuron. 2011 Nov 17;72(4):654-64. doi: 10.1016/j.neuron.2011.08.024.
Reward prediction error (RPE) signals are central to current models of reward-learning. Temporal difference (TD) learning models posit that these signals should be modulated by predictions, not only of magnitude but also timing of reward. Here we show that BOLD activity in the VTA conforms to such TD predictions: responses to unexpected rewards are modulated by a temporal hazard function and activity between a predictive stimulus and reward is depressed in proportion to predicted reward. By contrast, BOLD activity in ventral striatum (VS) does not reflect a TD RPE, but instead encodes a signal on the variable relevant for behavior, here timing but not magnitude of reward. The results have important implications for dopaminergic models of cortico-striatal learning and suggest a modification of the conventional view that VS BOLD necessarily reflects inputs from dopaminergic VTA neurons signaling an RPE.
奖励预测误差(RPE)信号是当前奖励学习模型的核心。时频差(TD)学习模型假设这些信号不仅应该受到奖励幅度的预测的调节,还应该受到奖励时间的预测的调节。在这里,我们表明腹侧被盖区(VTA)的 BOLD 活动符合这种 TD 预测:对意外奖励的反应受到时间危险函数的调节,并且在预测刺激和奖励之间的活动根据预测的奖励而被抑制。相比之下,腹侧纹状体(VS)的 BOLD 活动不反映 TD RPE,而是对行为相关的变量(此处是奖励的时间而不是幅度)进行编码。研究结果对皮质纹状体学习的多巴胺能模型具有重要意义,并表明对 VS BOLD 必然反映来自多巴胺能 VTA 神经元信号的 RPE 的传统观点进行了修正。