Department of Psychology, University of California, Los Angeles, Portola Plaza, Los Angeles, CA 91602, USA.
Department of Psychology, University of California, Los Angeles, Portola Plaza, Los Angeles, CA 91602, USA.
Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.
For over two decades, phasic activity in midbrain dopamine neurons was considered synonymous with the prediction error in temporal-difference reinforcement learning. Central to this proposal is the notion that reward-predictive stimuli become endowed with the scalar value of predicted rewards. When these cues are subsequently encountered, their predictive value is compared to the value of the actual reward received, allowing for the calculation of prediction errors. Phasic firing of dopamine neurons was proposed to reflect this computation, facilitating the backpropagation of value from the predicted reward to the reward-predictive stimulus, thus reducing future prediction errors. There are two critical assumptions of this proposal: (1) that dopamine errors can only facilitate learning about scalar value and not more complex features of predicted rewards, and (2) that the dopamine signal can only be involved in anticipatory cue-reward learning in which cues or actions precede rewards. Recent work has challenged the first assumption, demonstrating that phasic dopamine signals across species are involved in learning about more complex features of the predicted outcomes, in a manner that transcends this value computation. Here, we tested the validity of the second assumption. Specifically, we examined whether phasic midbrain dopamine activity would be necessary for backward conditioning-when a neutral cue reliably follows a rewarding outcome. Using a specific Pavlovian-to-instrumental transfer (PIT) procedure, we show rats learn both excitatory and inhibitory components of a backward association, and that this association entails knowledge of the specific identity of the reward and cue. We demonstrate that brief optogenetic inhibition of VTA neurons timed to the transition between the reward and cue reduces both of these components of backward conditioning. These findings suggest VTA neurons are capable of facilitating associations between contiguously occurring events, regardless of the content of those events. We conclude that these data may be in line with suggestions that the VTA error acts as a universal teaching signal. This may provide insight into why dopamine function has been implicated in myriad psychological disorders that are characterized by very distinct reinforcement-learning deficits.
二十多年来,中脑多巴胺神经元的相位活动被认为是时间差异强化学习中预测误差的同义词。这一观点的核心是,奖励预测性刺激具有预测奖励的标量值。当这些线索随后被遇到时,它们的预测价值与实际收到的奖励价值进行比较,从而计算预测误差。多巴胺神经元的相位放电被提议反映这种计算,促进从预测奖励到奖励预测性刺激的价值反向传播,从而减少未来的预测误差。这一建议有两个关键假设:(1)多巴胺误差只能促进对标量值的学习,而不能促进对预测奖励更复杂特征的学习;(2)多巴胺信号只能参与预期线索-奖励学习,其中线索或动作先于奖励。最近的工作挑战了第一个假设,证明跨物种的多巴胺相位信号参与了对预测结果更复杂特征的学习,这种方式超越了这种价值计算。在这里,我们测试了第二个假设的有效性。具体来说,我们检查了中脑多巴胺活动是否对逆向条件作用是必要的——当一个中性线索可靠地跟随一个奖励结果时。使用特定的条件反射到工具性转移(PIT)程序,我们发现老鼠学习了逆向关联的兴奋和抑制成分,并且这种关联需要对奖励和线索的具体身份的了解。我们证明,当奖励和线索之间的过渡时,短暂的 VTA 神经元的光遗传学抑制会减少这两种逆向条件作用的成分。这些发现表明 VTA 神经元能够促进连续发生的事件之间的关联,而不管这些事件的内容如何。我们得出结论,这些数据可能符合 VTA 误差作为通用教学信号的建议。这可能为多巴胺功能在众多以明显不同的强化学习缺陷为特征的心理障碍中被牵连提供了一些启示。