Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai 200237, China, Brain Science Institute, Tamagawa University, Machida, Tokyo 194-8610, Japan, Department of Psychology, Senshu University, Tama-ku, Kawasaki, Kanagawa 214-8580, Japan, and Research Institute for Electronic Science, Hokkaido University, Kita-ku, Sapporo 060-0812, Japan.
J Neurosci. 2014 Jan 22;34(4):1380-96. doi: 10.1523/JNEUROSCI.2263-13.2014.
The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus-reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning.
大脑包含多个不同的系统,这些系统都参与到了奖励预测中。为了理解这些过程的本质,我们在猴子执行使用非对称奖励计划的奖励推断任务时,从外侧前额叶皮层 (LPFC) 和纹状体中记录了单个神经元的活动。我们发现,无论是在 LPFC 还是纹状体中,神经元都可以预测之前在非对称奖励任务中具有固定奖励数量的刺激的奖励值。重要的是,这些 LPFC 神经元甚至可以使用传递推理来预测刺激的奖励值,即使猴子尚未直接学习刺激-奖励关联;而这些纹状体神经元则没有表现出这种能力。然而,由于存在两种固定的奖励数量(大的和小的),选择的纹状体神经元能够在直接体验具有另一种奖励值(例如小的)的替代刺激后,从一对中仅推断出一个新刺激的奖励值(例如大的)。我们的研究结果表明,虽然预测旧刺激奖励值的 LPFC 中的神经元也可以通过传递推理来预测新刺激的奖励值,但纹状体中的神经元只能通过排他性推理来预测新刺激的奖励值。此外,纹状体表现出的功能比之前推测的基于模型的学习更为复杂。