在一项关于多巴胺在强化学习中作用的正式测试中，区分预测误差和价值。

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

作者信息

Usypchuk Alexandra A, Maes Etienne J P, Lozzi Megan, Avramidis Dimitrios K, Schoenbaum Geoffrey, Esber Guillem R, Gardner Matthew P H, Iordanova Mihaela D

机构信息

Department of Psychology, Centre for Studies in Behavioural Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada.

NIDA Intramural Research Program, Baltimore, MD 21224, USA; Departments of Anatomy & Neurobiology and Psychiatry, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Solomon H. Snyder Department of Neuroscience, the Johns Hopkins University, Baltimore, MD 21287, USA.

出版信息

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

DOI:10.1016/j.cub.2025.06.076

PMID:40738112

Abstract

The discovery that midbrain dopamine (DA) transients can be mapped onto reward prediction errors (RPEs), the critical signal that drives learning, is a landmark in neuroscience. Causal support for the RPE hypothesis comes from studies showing that stimulating DA neurons can drive learning under conditions where it would not otherwise occur. However, such stimulation might also promote learning by adding reward value and indirectly inducing an RPE. This added value could support new learning even when it is insufficient to support instrumental behavior. Thus, these competing interpretations are challenging to disentangle and require direct comparison under matched conditions. We developed two computational models grounded in temporal difference reinforcement learning (TDRL) that dissociate the role of DA as an RPE versus a value signal. We validated our models by showing that they both predict learning (unblocking) when ventral tegmental area (VTA) DA stimulation occurs during expected reward delivery in a behavioral blocking design and confirmed this behaviorally. We then contrasted the models by delivering constant optogenetic stimulation during reward across both learning phases of blocking. The value model predicted blocking; the RPE model predicted unblocking. Behavioral results aligned with the latter. Moreover, the RPE model uniquely predicted that constant stimulation would unblock learning at higher frequencies (>20 Hz) when the artificial error alone drives learning. This, too, was confirmed experimentally. We demonstrate a principled computational and empirical dissociation between DA as an RPE versus a value signal. Our results advance understanding of how DA neuron stimulation drives learning.

摘要

中脑多巴胺（DA）瞬变可映射到奖励预测误差（RPEs）上，而RPEs是驱动学习的关键信号，这一发现是神经科学领域的一个里程碑。对RPE假说的因果支持来自于一些研究，这些研究表明，在其他情况下不会发生学习的条件下，刺激DA神经元可以驱动学习。然而，这种刺激也可能通过增加奖励价值并间接诱导RPE来促进学习。即使这种增加的价值不足以支持工具性行动，它也可以支持新的学习。因此，这些相互竞争的解释难以区分，需要在匹配条件下进行直接比较。我们开发了两种基于时间差分强化学习（TDRL）的计算模型，它们区分了DA作为RPE与价值信号的作用。我们通过表明在行为阻断设计中，当腹侧被盖区（VTA）DA刺激在预期奖励发放期间发生时，这两种模型都能预测学习（解除阻断），从而验证了我们的模型，并在行为上得到了证实。然后，我们在阻断的两个学习阶段的奖励过程中进行持续的光遗传学刺激，以此来对比这两种模型。价值模型预测会出现阻断；RPE模型预测会解除阻断。行为结果与后者一致。此外，RPE模型独特地预测，当仅人工误差驱动学习时，持续刺激在更高频率（>20 Hz）下会解除学习阻断。这一点也通过实验得到了证实。我们展示了DA作为RPE与价值信号之间在计算和实证方面的原则性区分。我们的结果推进了对DA神经元刺激如何驱动学习的理解。

相似文献

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.在一项关于多巴胺在强化学习中作用的正式测试中，区分预测误差和价值。

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Natural behaviour is learned through dopamine-mediated reinforcement.自然行为是通过多巴胺介导的强化作用习得的。

Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.

Sexual Harassment and Prevention Training性骚扰与预防培训

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Ventral tegmental area dopamine neural activity switches simultaneously with rule representations in the medial prefrontal cortex and hippocampus.腹侧被盖区多巴胺神经活动与内侧前额叶皮质和海马体中的规则表征同时切换。

J Neurosci. 2025 Mar 17. doi: 10.1523/JNEUROSCI.1670-24.2025.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

A multidimensional distributional map of future reward in dopamine neurons.多巴胺神经元中未来奖励的多维分布图。

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Autistic Students' Experiences of Employment and Employability Support while Studying at a UK University.自闭症学生在英国大学学习期间的就业经历及就业支持情况

Autism Adulthood. 2025 Apr 3;7(2):212-222. doi: 10.1089/aut.2024.0112. eCollection 2025 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在一项关于多巴胺在强化学习中作用的正式测试中，区分预测误差和价值。

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献