Intramural Research Program of the National Institute on Drug Abuse, NIH, Bethesda, MD, USA.
Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD, USA.
Proc Biol Sci. 2018 Nov 21;285(1891):20181645. doi: 10.1098/rspb.2018.1645.
Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.
中脑多巴胺神经元通常被认为报告了奖励预测误差(RPE),这是强化学习(RL)理论所假设的。虽然该理论非常成功,但有几条证据表明,多巴胺活动也编码了与奖励无关的感觉预测误差。在这里,我们提出了一个新的多巴胺功能理论,它包含了对预测误差的更广泛的概念化。通过对感觉和奖励预测中的误差进行信号传递,多巴胺支持了一种介于基于模型和无模型算法之间的 RL。该解释与当前关于多巴胺瞬变与 RPE 之间对应关系的规范仍然一致,同时也解释了新的数据,表明这些信号在诸如感觉预处理和身份解锁等现象中发挥作用,这些现象显然依赖于超出奖励预测的知识。