Schultz W, Tremblay L, Hollerman J R
Institute of Physiology and Program in Neuroscience, University of Fribourg, Switzerland.
Neuropharmacology. 1998 Apr-May;37(4-5):421-9. doi: 10.1016/s0028-3908(98)00071-9.
Reward information is processed in a limited number of brain structures, including fronto-basal ganglia systems. Dopamine neurons respond phasically to primary rewards and reward-predicting stimuli depending on reward unpredictability but without discriminating between rewards. These responses reflect 'errors' in the prediction of rewards in correspondence to learning theories and thus may constitute teaching signals for appetitive learning. Neurons in the striatum (caudate, putamen, ventral striatum) code reward predictions in a different manner. They are activated during several seconds when animals expect predicted rewards. During learning, these activations occur initially in rewarded and unrewarded trials and become subsequently restricted to rewarded trials. This occurs in parallel with the adaptation of reward expectations by the animals, as inferred from their behavioral reactions. Neurons in orbitofrontal cortex respond differentially to stimuli predicting different liquid rewards, without coding spatial or visual features. Thus, different structures process reward information processed in different ways. Whereas dopamine neurons emit a reward teaching signal without indicating the specific reward, striatal neurons adapt expectation activity to new reward situations, and orbitofrontal neurons process the specific nature of rewards. These reward signals need to cooperate in order for reward information to be used for learning and maintaining approach behavior.
奖赏信息在数量有限的脑结构中进行处理,包括额-基底神经节系统。多巴胺神经元根据奖赏的不可预测性,对初级奖赏和奖赏预测性刺激产生相位性反应,但并不区分不同的奖赏。这些反应反映了与学习理论相对应的奖赏预测中的“误差”,因此可能构成了食欲性学习的教学信号。纹状体(尾状核、壳核、腹侧纹状体)中的神经元以不同的方式编码奖赏预测。当动物预期到预测的奖赏时,它们会在几秒钟内被激活。在学习过程中,这些激活最初在有奖励和无奖励的试验中都会出现,随后会局限于有奖励的试验中。从动物的行为反应推断,这与动物奖赏预期的调整同时发生。眶额皮质中的神经元对预测不同液体奖赏的刺激有不同反应,而不编码空间或视觉特征。因此,不同的结构以不同的方式处理奖赏信息。多巴胺神经元发出奖赏教学信号,但不指明具体的奖赏,而纹状体神经元会使预期活动适应新的奖赏情境,眶额神经元则处理奖赏的具体性质。为了使奖赏信息用于学习和维持趋近行为,这些奖赏信号需要协同作用。