Departamento de Física Teórica, Universidad Autónoma de Madrid, Cantoblanco 28049, Madrid, Spain.
Centro de Investigación Avanzada en Física Fundamental, Universidad Autónoma de Madrid, Cantoblanco 28049, Madrid, Spain.
Proc Natl Acad Sci U S A. 2017 Nov 28;114(48):E10494-E10503. doi: 10.1073/pnas.1712479114. Epub 2017 Nov 13.
Learning to associate unambiguous sensory cues with rewarded choices is known to be mediated by dopamine (DA) neurons. However, little is known about how these neurons behave when choices rely on uncertain reward-predicting stimuli. To study this issue we reanalyzed DA recordings from monkeys engaged in the detection of weak tactile stimuli delivered at random times and formulated a reinforcement learning model based on belief states. Specifically, we investigated how the firing activity of DA neurons should behave if they were coding the error in the prediction of the total future reward when animals made decisions relying on uncertain sensory and temporal information. Our results show that the same signal that codes for reward prediction errors also codes the animal's certainty about the presence of the stimulus and the temporal expectation of sensory cues.
学习将明确的感官线索与奖励选择联系起来,这被认为是由多巴胺 (DA) 神经元介导的。然而,对于这些神经元在依赖不确定的奖励预测刺激的情况下如何表现,人们知之甚少。为了研究这个问题,我们重新分析了猴子在检测随机时间内给予的微弱触觉刺激时的 DA 神经元记录,并基于信念状态制定了一个强化学习模型。具体来说,我们研究了当动物在依赖不确定的感觉和时间信息做出决策时,DA 神经元的放电活动应该如何表现,如果它们正在编码对未来总奖励预测的误差。我们的结果表明,编码奖励预测误差的相同信号也编码了动物对刺激存在的确定性和对感觉线索的时间期望。