Sadacca Brian F, Jones Joshua L, Schoenbaum Geoffrey
Intramural Research program of the National Institute on Drug Abuse, National Institutes of Health, Bethesda, United States.
Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, United States.
Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.
Midbrain dopamine neurons have been proposed to signal reward prediction errors as defined in temporal difference (TD) learning algorithms. While these models have been extremely powerful in interpreting dopamine activity, they typically do not use value derived through inference in computing errors. This is important because much real world behavior - and thus many opportunities for error-driven learning - is based on such predictions. Here, we show that error-signaling rat dopamine neurons respond to the inferred, model-based value of cues that have not been paired with reward and do so in the same framework as they track the putative cached value of cues previously paired with reward. This suggests that dopamine neurons access a wider variety of information than contemplated by standard TD models and that, while their firing conforms to predictions of TD models in some cases, they may not be restricted to signaling errors from TD predictions.
中脑多巴胺神经元被认为可像在时间差分(TD)学习算法中定义的那样,发出奖励预测误差信号。虽然这些模型在解释多巴胺活动方面极为强大,但它们在计算误差时通常不会使用通过推理得出的值。这一点很重要,因为许多现实世界的行为——以及由此产生的许多基于误差驱动学习的机会——都是基于此类预测。在这里,我们表明,发出误差信号的大鼠多巴胺神经元会对未与奖励配对的线索的基于模型的推断值做出反应,并且其反应框架与它们追踪先前与奖励配对的线索的假定缓存值时相同。这表明多巴胺神经元能够获取比标准TD模型所设想的更多种类的信息,并且虽然它们的放电在某些情况下符合TD模型的预测,但它们可能并不局限于发出TD预测的误差信号。