Bakhurin Konstantin I, Hughes Ryan N, Jiang Qiaochu, Hossain Meghdoot, Gutkin Boris, Fallon Isabella P, Yin Henry
bioRxiv. 2023 Jun 7:2023.04.23.537994. doi: 10.1101/2023.04.23.537994.
According to a popular hypothesis, phasic dopamine (DA) activity encodes a reward prediction error (RPE) necessary for reinforcement learning. However, recent work showed that DA neurons are necessary for performance rather than learning. One limitation of previous work on phasic DA signaling and RPE is the limited behavioral measures. Here, we measured subtle force exertion while recording and manipulating DA activity in the ventral tegmental area (VTA) during stimulus-reward learning. We found two major populations of DA neurons that increased firing before forward and backward force exertion. Force tuning is the same regardless of learning, reward predictability, or outcome valence. Changes in the pattern of force exertion can explain results traditionally used to support the RPE hypothesis, such as modulation by reward magnitude, probability, and unpredicted reward delivery or omission. Thus VTA DA neurons are not used to signal RPE but to regulate force exertion during motivated behavior.
根据一个流行的假说,阶段性多巴胺(DA)活动编码强化学习所需的奖励预测误差(RPE)。然而,最近的研究表明,DA神经元对行为表现而非学习是必要的。先前关于阶段性DA信号和RPE的研究的一个局限性是行为测量方法有限。在这里,我们在刺激-奖励学习过程中记录和操纵腹侧被盖区(VTA)的DA活动时,测量了细微的力量施加情况。我们发现了两类主要的DA神经元,它们在向前和向后施加力量之前放电增加。无论学习情况、奖励可预测性或结果效价如何,力量调谐都是相同的。力量施加模式的变化可以解释传统上用于支持RPE假说的结果,例如奖励大小、概率以及意外奖励发放或遗漏所产生的调节作用。因此,VTA DA神经元并非用于发出RPE信号,而是在动机行为期间调节力量施加。