Hollerman J R, Tremblay L, Schultz W
Institute of Physiology, Université de Fribourg, Switzerland.
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
An impressive array of neural processing appears to be dedicated to the extraction of reward-related information from environmental stimuli and use of this information in the generation of goal-directed behaviors. While other structures are certainly involved in these processes, the characteristics of activations seen in mesencephalic dopamine neurons, striatal neurons and neurons of the orbitofrontal cortex provide distinct examples of the different ways in which reward-related information is processed. In addition, the differences in activations seen in these three regions demonstrate the different roles they may play in goal-directed behavior. A principal role played by dopamine neurons is that of a detector of an error in reward prediction. The homogeneity of responsiveness across the population of dopamine neurons indicates that this error signal is widely broadcast to dopamine terminal regions where it could provide a teaching signal for synaptic modifications underlying the learning of goal-directed appetitive behaviors. The responses of these same neurons to conditioned stimuli associated with reward could also serve as a signal of prediction error useful for the learning of sequences of environmental stimuli leading to reward. Dopamine neuron responses to both rewards and conditioned stimuli are not contingent on the behavior executed to obtain the reward and thus appear to reflect a relatively pure signal of a reward prediction error. It is not yet clear whether these activations, and responses to novel stimuli, have an additional function in engaging neural systems involved in the representation and execution of goal-directed behaviors. This representation of goal-directed behaviors may involve the striatal regions studied, where processing of reward-related information appears to be much more heterogeneous. Different subpopulations of striatal neurons are activated at different stages in the course of goal-directed behaviors, with largely separate populations activated following presentation of conditioned stimuli, preceding reinforcers, and following reinforcers. Neurons exhibiting each of these types of activation appear to differentiate between rewarding and non-rewarding outcomes of behavioral acts and, as a population, appear to be biased towards processing reward vs. non-reward. These activations observed in the striatum were often contingent on the behavioral act associated with obtaining reward, reflecting an integration of information not observed in dopamine neurons. Another difference between reward processing in striatal neurons and dopamine neurons is the influence of predictability on neuronal responsiveness. Unlike dopamine neurons, many striatal neurons respond to predicted rewards, although at least some may reflect the relative degree of predictability in the magnitude of the responses to reward. Thus, striatal processing of reward-related information is in some ways more complex than that observed in dopamine neurons, incorporating information on behavior and potentially providing more detailed information regarding predictability. These activations could serve as a component of the neural representation of the goal, and/or the behavioral aspects of goal-directed behaviors. As such they would be of use for the execution of appropriate goal-directed behaviors in response to known environmental stimuli, as well as for generating behaviors in response to novel stimuli that may be associated with desirable goals. Neuronal activations in the orbitofrontal cortex appear to involve less integration of behavioral and reward-related information, but rather incorporate another aspect of reward, the relative motivational significance of different rewards. These activations would serve a function similar to those striatal neurons that encode exclusively reward-related information in situations in which only a single outcome is obtainable. (ABSTRACT TRUNCATED)
一系列令人印象深刻的神经处理过程似乎致力于从环境刺激中提取与奖励相关的信息,并在生成目标导向行为时使用这些信息。虽然其他结构肯定也参与这些过程,但中脑多巴胺神经元、纹状体神经元和眶额叶皮质神经元中观察到的激活特征,提供了处理与奖励相关信息的不同方式的独特示例。此外,在这三个区域观察到的激活差异,表明了它们在目标导向行为中可能发挥的不同作用。多巴胺神经元发挥的主要作用是作为奖励预测误差的检测器。多巴胺神经元群体反应的同质性表明,这种误差信号被广泛传播到多巴胺终末区域,在那里它可以为目标导向的食欲行为学习背后的突触修饰提供一个教学信号。这些相同神经元对与奖励相关的条件刺激的反应,也可以作为预测误差信号,有助于学习导致奖励的环境刺激序列。多巴胺神经元对奖励和条件刺激的反应并不取决于为获得奖励而执行的行为,因此似乎反映了相对纯粹的奖励预测误差信号。目前尚不清楚这些激活以及对新刺激的反应,在参与目标导向行为的表征和执行的神经系统中是否具有额外功能。目标导向行为的这种表征可能涉及所研究的纹状体区域,在那里与奖励相关信息的处理似乎更加多样化。纹状体神经元的不同亚群在目标导向行为过程的不同阶段被激活,在呈现条件刺激后、强化物之前和强化物之后,有很大一部分是分别被激活的。表现出每种类型激活的神经元似乎能够区分行为行为的奖励性和非奖励性结果,并且作为一个群体,似乎倾向于处理奖励与非奖励。在纹状体中观察到的这些激活通常取决于与获得奖励相关的行为,反映了在多巴胺神经元中未观察到的信息整合。纹状体神经元和多巴胺神经元在奖励处理方面的另一个区别是可预测性对神经元反应性的影响。与多巴胺神经元不同,许多纹状体神经元对预测的奖励有反应,尽管至少一些可能反映了对奖励反应幅度的相对可预测程度。因此,纹状体对与奖励相关信息的处理在某些方面比在多巴胺神经元中观察到的更为复杂,它整合了行为信息,并可能提供有关可预测性的更详细信息。这些激活可以作为目标神经表征的一个组成部分,和/或目标导向行为的行为方面。因此,它们将有助于在响应已知环境刺激时执行适当的目标导向行为,以及在响应可能与期望目标相关的新刺激时生成行为。眶额叶皮质中的神经元激活似乎较少整合行为和与奖励相关的信息,而是纳入了奖励的另一个方面,即不同奖励的相对动机意义。这些激活将发挥与那些在只能获得单一结果的情况下专门编码与奖励相关信息的纹状体神经元类似的功能。(摘要截断)