Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bldg. 49, Rm. 2A50, Bethesda, Maryland 20892-4435, USA.
J Neurophysiol. 2010 Aug;104(2):1068-76. doi: 10.1152/jn.00158.2010. Epub 2010 Jun 10.
The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.
通过重复的刺激-奖励配对进行强化学习,以及基于手头任务的知识进行抽象推理。强化机制通常与中脑多巴胺神经元有关。在这里,我们表明,控制多巴胺系统的神经通路并不完全依赖于刺激-奖励配对或抽象推理,而是结合了两者。我们训练猴子执行一项奖励偏向性的扫视任务,其中两个扫视目标的奖励值以系统的方式相关。动物利用每次试验的奖励结果来学习两个目标的价值:呈现的目标及其奖励结果(体验价值)和未呈现的目标,但可以根据任务的奖励统计信息推断其价值(推断价值)。然后,我们从三种奖励编码神经元群体中进行记录:黑质多巴胺神经元;多巴胺神经元的一个主要输入,外侧缰核;以及投射到外侧缰核的神经元,位于苍白球中。这三个群体都编码了体验价值和推断价值。在一些动物中,神经元对体验价值的编码比推断价值更强,并且动物在从经验学习中表现出比从推理中更快的行为证据。我们的数据表明,苍白球-缰核-多巴胺通路通过经验和推理来估计奖励价值。