Tremblay L, Schultz W
Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland.
J Neurophysiol. 2000 Apr;83(4):1877-85. doi: 10.1152/jn.2000.83.4.1877.
This study investigated how neuronal activity in orbitofrontal cortex related to the expectation of reward changed while monkeys repeatedly learned to associate new instruction pictures with known behavioral reactions and reinforcers. In a delayed go-nogo task with several trial types, an initial picture instructed the animal to execute or withhold a reaching movement and to expect a liquid reward or a conditioned auditory reinforcer. When novel instruction pictures were presented, animals learned according to a trial-and-error strategy. After experience with a large number of novel pictures, learning occurred in a few trials, and correct performance usually exceeded 70% in the first 60-90 trials. About 150 task-related neurons in orbitofrontal cortex were studied in both familiar and learning conditions and showed two major forms of changes during learning. Quantitative changes of responses to the initial instruction were seen as appearance of new responses, increase of existing responses, or decrease or complete disappearance of responses. The changes usually outlasted initial learning trials and persisted during subsequent consolidation. They often modified the trial selectivities of activations. Increases might reflect the increased attention during learning and induce neuronal changes underlying the behavioral adaptations. Decreases might be related to the unreliable reward-predicting value of frequently changing learning instructions. The second form of changes reflected the adaptation of reward expectations during learning. In initial learning trials, animals reacted as if they expected liquid reward in every trial type, although only two of the three trial types were rewarded with liquid. In close correspondence, neuronal activations related to the expectation of reward occurred initially in every trial type. The behavioral indices for reward expectation and their neuronal correlates adapted in parallel during the course of learning and became restricted to rewarded trials. In conclusion, these data support the notion that neurons in orbitofrontal cortex code reward information in a flexible and adaptive manner during behavioral changes after novel stimuli.
本研究调查了在猴子反复学习将新的指令图片与已知行为反应及强化物建立联系的过程中,眶额皮质中与奖励预期相关的神经元活动是如何变化的。在一个具有多种试验类型的延迟停止-继续任务中,初始图片指示动物执行或抑制伸手动作,并预期获得液体奖励或条件性听觉强化物。当呈现新的指令图片时,动物根据试错策略进行学习。在经历大量新图片后,只需经过几次试验就能完成学习,在前60 - 90次试验中正确表现通常超过70%。在熟悉和学习两种条件下,对眶额皮质中约150个与任务相关的神经元进行了研究,结果显示在学习过程中出现了两种主要的变化形式。对初始指令反应的定量变化表现为新反应的出现、现有反应的增加、反应的减少或完全消失。这些变化通常在初始学习试验之后仍然持续,并在随后的巩固过程中持续存在。它们常常改变激活的试验选择性。反应增加可能反映了学习过程中注意力的增强,并引发行为适应背后的神经元变化。反应减少可能与频繁变化的学习指令中奖励预测价值的不可靠性有关。第二种变化形式反映了学习过程中奖励预期的适应性变化。在初始学习试验中,动物的反应就好像它们预期在每种试验类型中都能获得液体奖励,尽管三种试验类型中只有两种会给予液体奖励。与之密切对应,与奖励预期相关的神经元激活最初出现在每种试验类型中。在学习过程中,奖励预期的行为指标及其神经元相关指标并行适应,并逐渐局限于有奖励的试验中。总之,这些数据支持了这样一种观点,即在新刺激后的行为变化过程中,眶额皮质中的神经元以灵活且适应性的方式编码奖励信息。