Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan.
Nat Neurosci. 2022 Aug;25(8):1082-1092. doi: 10.1038/s41593-022-01109-2. Epub 2022 Jul 7.
A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.
大量证据表明,中脑多巴胺神经元的相位反应与机器学习中使用的一种教学信号(时间差分 (TD) 误差)非常相似。然而,之前的研究未能观察到该算法的一个关键预测:当一个代理将提示和奖励关联起来,而奖励和提示在时间上是分开的,那么多巴胺信号的时间应该在多个试验中从奖励时间逐渐向后移动到提示时间。在这里,我们证明了这种逐渐的转变既发生在小鼠腹侧纹状体的多巴胺能细胞活动水平上,也发生在多巴胺释放水平上。我们的结果在多巴胺能活动和 TD 学习算法之间建立了长期以来寻求的联系,为大脑如何将时间上分开的提示和奖励联系起来提供了基本的见解。