Satoh Takemasa, Nakai Sadamu, Sato Tatsuo, Kimura Minoru
Department of Physiology, Kyoto Prefectural University of Medicine, Kawaramachi-Hirokoji, Kamigyo-ku, Kyoto 602-8566, Japan.
J Neurosci. 2003 Oct 29;23(30):9913-23. doi: 10.1523/JNEUROSCI.23-30-09913.2003.
We recorded the activity of midbrain dopamine neurons in an instrumental conditioning task in which monkeys made a series of behavioral decisions on the basis of distinct reward expectations. Dopamine neurons responded to the first visual cue that appeared in each trial [conditioned stimulus (CS)] through which monkeys initiated trial for decision while expecting trial-specific reward probability and volume. The magnitude of neuronal responses to the CS was approximately proportional to reward expectations but with considerable discrepancy. In contrast, CS responses appear to represent motivational properties, because their magnitude at trials with identical reward expectation had significant negative correlation with reaction times of the animal after the CS. Dopamine neurons also responded to reinforcers that occurred after behavioral decisions, and the responses precisely encoded positive and negative reward expectation errors (REEs). The gain of coding REEs by spike frequency increased during learning act-outcome contingencies through a few months of task training, whereas coding of motivational properties remained consistent during the learning. We found that the magnitude of CS responses was positively correlated with that of reinforcers, suggesting a modulation of the effectiveness of REEs as a teaching signal by motivation. For instance, rate of learning could be faster when animals are motivated, whereas it could be slower when less motivated, even at identical REEs. Therefore, the dual correlated coding of motivation and REEs suggested the involvement of the dopamine system, both in reinforcement in more elaborate ways than currently proposed and in motivational function in reward-based decision-making and learning.
我们在一项工具性条件反射任务中记录了中脑多巴胺神经元的活动,在此任务中,猴子基于不同的奖励预期做出一系列行为决策。多巴胺神经元对每次试验中出现的第一个视觉线索[条件刺激(CS)]产生反应,猴子通过该线索开始决策试验,同时预期特定试验的奖励概率和数量。神经元对CS的反应强度大致与奖励预期成正比,但存在相当大的差异。相比之下,CS反应似乎代表了动机属性,因为在具有相同奖励预期的试验中,其反应强度与动物在CS出现后的反应时间呈显著负相关。多巴胺神经元也对行为决策后出现的强化物产生反应,并且这些反应精确地编码了正性和负性奖励预期误差(REEs)。在几个月的任务训练中,通过学习行为-结果的偶然性,由脉冲频率编码REEs的增益增加,而在学习过程中,动机属性的编码保持一致。我们发现CS反应的强度与强化物的强度呈正相关,这表明动机对REEs作为一种教学信号的有效性具有调节作用。例如,当动物有动机时,学习速度可能更快,而当动机较小时,学习速度可能较慢,即使REEs相同。因此,动机和REEs的双重相关编码表明多巴胺系统参与其中,这不仅以比目前所提出的更为精细的方式参与强化过程,还参与基于奖励的决策和学习中的动机功能。