Daw Nathaniel D, Touretzky David S
Computer Science Department and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Neural Comput. 2002 Nov;14(11):2567-83. doi: 10.1162/089976602760407973.
This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.
本文探讨了多巴胺系统的时间差分(TD)模型中,长期奖励预测与慢时间尺度神经活动之间的关系。此类模型试图解释多巴胺(DA)神经元的活动如何与未来奖励预测中的误差相关。先前的模型大多局限于对单个、某种程度上人为定义的试验中预期奖励的短期预测。此外,这些模型仅专注于灵长类动物DA神经元的相位性停顿和爆发活动;假定神经元较慢的紧张性背景活动是恒定的。这导致难以解释在慢时间尺度上测量DA释放指标的神经化学实验结果,这些结果乍一看与奖励预测模型不一致。在本文中,我们研究了一种经过修改的DA活动TD模型,使其能够对未来很久之后预期的奖励进行长期预测。我们表明,这些预测表现为基线误差信号的缓慢变化,我们将其与紧张性DA活动相关联。使用该模型,我们对多种实验情况下DA系统的行为做出了新的预测。其中一些预测为先前令人困惑的数据提供了新的计算解释,例如微透析研究中由厌恶事件引发的DA活动升高的指标。