Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; UCL Institute of Ophthalmology, University College London, London EC1V 9EL, UK.
Brain Science Institute, Tamagawa University, Machida, Tokyo 194-8610, Japan; Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Av. de Brasilia, 1400-038 Lisbon, Portugal.
Curr Biol. 2017 Mar 20;27(6):821-832. doi: 10.1016/j.cub.2017.02.026. Epub 2017 Mar 9.
Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was ("reward prediction error," RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model's predictions. Consequently, dopamine responses did not simply reflect a stimulus' average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys' choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions.
行为的组织核心是预测结果值以指导选择的能力。这种预测的准确性是通过一种指示预测错误程度的教学信号(奖励预测误差,RPE)来磨练的。在几种强化学习情境中,例如巴甫洛夫条件反射和基于奖励历史的决策,这种 RPE 信号是由中脑多巴胺神经元提供的。然而,在许多情况下,预测结果的刺激是感知上模糊的。感知不确定性已知会影响选择,但尚不清楚多巴胺神经元是否以及如何将其纳入其教学信号中。为了应对不确定性,我们扩展了一个强化学习模型,该模型具有关于感知上模糊刺激的信念状态;该模型生成选择正确性的概率估计,称为决策信心。我们表明,猴子在执行感知上模糊的决策任务时的多巴胺反应符合模型的预测。因此,多巴胺反应并不是简单地反映了刺激的平均预期奖励值,而是可以预测感知准确性的逐次波动。这些与信心相关的多巴胺反应出现在猴子选择开始之前,这表明多巴胺可能会影响即将做出的决策,而不仅仅是对决策后的教学信号进行编码。最后,通过操纵奖励大小,我们发现多巴胺神经元反映了即将到来的奖励大小及其实现的信心。总之,我们的研究结果表明,多巴胺反应传达的教学信号也适用于感知决策。