Suppr超能文献

在一项关于多巴胺在强化学习中作用的正式测试中,区分预测误差和价值。

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

作者信息

Usypchuk Alexandra A, Maes Etienne J P, Lozzi Megan, Avramidis Dimitrios K, Schoenbaum Geoffrey, Esber Guillem R, Gardner Matthew P H, Iordanova Mihaela D

机构信息

Department of Psychology, Centre for Studies in Behavioural Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada.

NIDA Intramural Research Program, Baltimore, MD 21224, USA; Departments of Anatomy & Neurobiology and Psychiatry, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Solomon H. Snyder Department of Neuroscience, the Johns Hopkins University, Baltimore, MD 21287, USA.

出版信息

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

Abstract

The discovery that midbrain dopamine (DA) transients can be mapped onto reward prediction errors (RPEs), the critical signal that drives learning, is a landmark in neuroscience. Causal support for the RPE hypothesis comes from studies showing that stimulating DA neurons can drive learning under conditions where it would not otherwise occur. However, such stimulation might also promote learning by adding reward value and indirectly inducing an RPE. This added value could support new learning even when it is insufficient to support instrumental behavior. Thus, these competing interpretations are challenging to disentangle and require direct comparison under matched conditions. We developed two computational models grounded in temporal difference reinforcement learning (TDRL) that dissociate the role of DA as an RPE versus a value signal. We validated our models by showing that they both predict learning (unblocking) when ventral tegmental area (VTA) DA stimulation occurs during expected reward delivery in a behavioral blocking design and confirmed this behaviorally. We then contrasted the models by delivering constant optogenetic stimulation during reward across both learning phases of blocking. The value model predicted blocking; the RPE model predicted unblocking. Behavioral results aligned with the latter. Moreover, the RPE model uniquely predicted that constant stimulation would unblock learning at higher frequencies (>20 Hz) when the artificial error alone drives learning. This, too, was confirmed experimentally. We demonstrate a principled computational and empirical dissociation between DA as an RPE versus a value signal. Our results advance understanding of how DA neuron stimulation drives learning.

摘要

中脑多巴胺(DA)瞬变可映射到奖励预测误差(RPEs)上,而RPEs是驱动学习的关键信号,这一发现是神经科学领域的一个里程碑。对RPE假说的因果支持来自于一些研究,这些研究表明,在其他情况下不会发生学习的条件下,刺激DA神经元可以驱动学习。然而,这种刺激也可能通过增加奖励价值并间接诱导RPE来促进学习。即使这种增加的价值不足以支持工具性行动,它也可以支持新的学习。因此,这些相互竞争的解释难以区分,需要在匹配条件下进行直接比较。我们开发了两种基于时间差分强化学习(TDRL)的计算模型,它们区分了DA作为RPE与价值信号的作用。我们通过表明在行为阻断设计中,当腹侧被盖区(VTA)DA刺激在预期奖励发放期间发生时,这两种模型都能预测学习(解除阻断),从而验证了我们的模型,并在行为上得到了证实。然后,我们在阻断的两个学习阶段的奖励过程中进行持续的光遗传学刺激,以此来对比这两种模型。价值模型预测会出现阻断;RPE模型预测会解除阻断。行为结果与后者一致。此外,RPE模型独特地预测,当仅人工误差驱动学习时,持续刺激在更高频率(>20 Hz)下会解除学习阻断。这一点也通过实验得到了证实。我们展示了DA作为RPE与价值信号之间在计算和实证方面的原则性区分。我们的结果推进了对DA神经元刺激如何驱动学习的理解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验