Program in Neuroscience, Harvard Medical School, Boston, MA 02115, USA; MD-PhD Program, Harvard Medical School, Boston, MA 02115, USA.
Center for Neuroscience Imaging Research, Institute for Basic Science, Suwon 16419, Republic of Korea; Department of Biomedical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea; Department of Molecular and Cellular Biology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
Curr Biol. 2022 Mar 14;32(5):1077-1087.e9. doi: 10.1016/j.cub.2022.01.025. Epub 2022 Feb 2.
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.
基底神经节的强化学习模型将相位多巴胺信号映射到奖励预测误差(RPE)。传统模型断言,当刺激以固定延迟预测奖励时,多巴胺活动在延迟期间应该通过学习收敛到基线。然而,最近的研究发现,在某些条件下,即使在学习之后,多巴胺在奖励之前也会上升,从而挑战了传统模型。在这项工作中,我们表明,感官反馈导致无偏学习者产生 RPE 斜坡。我们的模型预测,当在试验期间逐渐降低反馈时,多巴胺活动应该类似于“凸起”,其上升阶段应该大于反馈保持较高的情况。我们在具有不同亮度的虚拟导航任务中对老鼠进行了训练,并且都观察到了这两个预测。总之,我们的理论和实验结果在 RPE 假设下协调了多巴胺行为的看似冲突的数据。