Schultz W
Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland.
J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.
The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest that midbrain dopamine systems are involved in processing reward information and learning approach behavior. Most dopamine neurons show phasic activations after primary liquid and food rewards and conditioned, reward-predicting visual and auditory stimuli. They show biphasic, activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. However, only few phasic activations follow aversive stimuli. Thus dopamine neurons label environmental stimuli with appetitive value, predict and detect rewards and signal alerting and motivating events. By failing to discriminate between different rewards, dopamine neurons appear to emit an alerting message about the surprising presence or absence of rewards. All responses to rewards and reward-predicting stimuli depend on event predictability. Dopamine neurons are activated by rewarding events that are better than predicted, remain uninfluenced by events that are as good as predicted, and are depressed by events that are worse than predicted. By signaling rewards according to a prediction error, dopamine responses have the formal characteristics of a teaching signal postulated by reinforcement learning theories. Dopamine responses transfer during learning from primary rewards to reward-predicting stimuli. This may contribute to neuronal mechanisms underlying the retrograde action of rewards, one of the main puzzles in reinforcement learning. The impulse response releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons. This signal may improve approach behavior by providing advance reward information before the behavior occurs, and may contribute to learning by modifying synaptic transmission. The dopamine reward signal is supplemented by activity in neurons in striatum, frontal cortex, and amygdala, which process specific reward information but do not emit a global reward prediction error signal. A cooperation between the different reward signals may assure the use of specific rewards for selectively reinforcing behaviors. Among the other projection systems, noradrenaline neurons predominantly serve attentional mechanisms and nucleus basalis neurons code rewards heterogeneously. Cerebellar climbing fibers signal errors in motor performance or errors in the prediction of aversive events to cerebellar Purkinje cells. Most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal but may reflect the absence of a general enabling function of tonic levels of extracellular dopamine. Thus dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.
损伤、受体阻断、电自我刺激以及滥用药物的影响表明,中脑多巴胺系统参与奖励信息的处理和学习趋近行为。大多数多巴胺神经元在初次液体和食物奖励以及条件性、奖励预测性视觉和听觉刺激后呈现相位激活。在类似于奖励预测性刺激、新颖或特别突出的刺激后,它们表现出双相的激活-抑制反应。然而,只有少数相位激活跟随厌恶刺激。因此,多巴胺神经元用具有吸引力的价值标记环境刺激,预测和检测奖励,并发出警报和激发事件的信号。由于无法区分不同的奖励,多巴胺神经元似乎发出了关于奖励意外出现或缺失的警报信息。对奖励和奖励预测性刺激的所有反应都取决于事件的可预测性。多巴胺神经元在奖励事件比预期更好时被激活,在与预期一样好的事件中不受影响,在比预期更差的事件中被抑制。通过根据预测误差发出奖励信号,多巴胺反应具有强化学习理论假设的教学信号的形式特征。在学习过程中,多巴胺反应从初级奖励转移到奖励预测性刺激。这可能有助于强化学习中主要谜题之一的奖励逆行作用的神经元机制。冲动反应将一小股多巴胺释放到许多树突上,从而向突触后神经元广播一个相当全局性的强化信号。这个信号可以通过在行为发生前提供提前的奖励信息来改善趋近行为,并且可能通过改变突触传递来促进学习。多巴胺奖励信号由纹状体、额叶皮质和杏仁核中的神经元活动补充,这些神经元处理特定的奖励信息,但不发出全局性的奖励预测误差信号。不同奖励信号之间的合作可能确保使用特定奖励来选择性地强化行为。在其他投射系统中,去甲肾上腺素神经元主要服务于注意力机制,基底核神经元对奖励进行异质性编码。小脑攀缘纤维向小脑浦肯野细胞发出运动表现误差或厌恶事件预测误差的信号。多巴胺耗竭性损伤后的大多数缺陷不容易用有缺陷的奖励信号来解释,但可能反映了细胞外多巴胺紧张水平的一般促进功能的缺失。因此,多巴胺系统可能有两种功能,奖励信息的相位传递和突触后神经元的紧张促进作用。