Rhodan Center for Nervous System Repair, Department of Neurosurgery, Massachusetts General Hospital, and Harvard Medical School, Boston, Massachusetts 02114, USA.
J Neurosci. 2011 Dec 7;31(49):17772-87. doi: 10.1523/JNEUROSCI.3793-11.2011.
Learning can be motivated by unanticipated success or unexpected failure. The former encourages us to repeat an action or activity, whereas the latter leads us to find an alternative strategy. Understanding the neural representation of these unexpected events is therefore critical to elucidate learning-related circuits. We examined the activity of neurons in the lateral prefrontal cortex (PFC) and caudate nucleus of monkeys as they performed a trial-and-error learning task. Unexpected outcomes were widely represented in both structures, and neurons driven by unexpectedly negative outcomes were as frequent as those activated by unexpectedly positive outcomes. Moreover, both positive and negative reward prediction errors (RPEs) were represented primarily by increases in firing rate, unlike the manner in which dopamine neurons have been observed to reflect these values. Interestingly, positive RPEs tended to appear with shorter latency than negative RPEs, perhaps reflecting the mechanism of their generation. Last, in the PFC but not the caudate, trial-by-trial variations in outcome-related activity were linked to the animals' subsequent behavioral decisions. More broadly, the robustness of RPE signaling by these neurons suggests that actor-critic models of reinforcement learning in which the PFC and particularly the caudate are considered primarily to be "actors" rather than "critics," should be reconsidered to include a prominent evaluative role for these structures.
学习可以受到意外成功或意外失败的激励。前者鼓励我们重复一个动作或活动,而后者则促使我们寻找替代策略。因此,理解这些意外事件的神经表示对于阐明与学习相关的回路至关重要。我们观察了猴子外侧前额叶皮层 (PFC) 和尾状核中神经元在进行试错学习任务时的活动。在这两个结构中,意外结果都得到了广泛的表示,并且由意外负面结果驱动的神经元与由意外正面结果驱动的神经元一样频繁。此外,无论是正的还是负的奖励预测误差 (RPE) 主要都表现为放电率的增加,这与观察到多巴胺神经元反映这些值的方式不同。有趣的是,正 RPE 似乎比负 RPE 出现的潜伏期更短,这可能反映了它们产生的机制。最后,在 PFC 中而不是尾状核中,与结果相关的活动在试验中的变化与动物随后的行为决策有关。更广泛地说,这些神经元的 RPE 信号的稳健性表明,强化学习的行为-评价模型应该重新考虑,其中 PFC 特别是尾状核被认为主要是“行为者”而不是“评价者”,以包括这些结构的突出评价作用。