• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

纹状体价值衰减梯度解释了多巴胺模式和强化学习计算中的区域差异。

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

作者信息

Kato Ayaka, Morita Kenji

机构信息

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, United States.

Postdoctral Fellowship for Research Abroad, Japan Society for the Promotion of Science, Tokyo 102-0083, Japan.

出版信息

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

DOI:10.1523/JNEUROSCI.0170-25.2025
PMID:40681344
Abstract

Dopamine has been suggested to encode reward-prediction-error (RPE) in reinforcement learning (RL) theory, but also shown to exhibit heterogeneous patterns depending on regions and conditions: some exhibiting ramping response to predictable reward while others only responding to reward-predicting cue. It remains elusive how these heterogeneities relate to various RL algorithms proposed to be employed by animals/humans, such as RL under predictive state representation, hierarchical RL, and distributional RL. Here we demonstrate that these relationships can be coherently explained by incorporating the decay of learned values (value-decay), implementable by the decay of dopamine-dependent plastic changes in the synaptic strengths. First, we show that value-decay causes ramping RPE under certain state representations but not under others. This accounted for the observed gradual fading of dopamine ramping across repeated reward navigation, attributed to the gradual formation of predictive state representations. It also explained the cue-type and inter-trial-interval-dependent temporal patterns of dopamine. Next, we constructed a hierarchical RL model composed of two coupled systems-one with value-decay and one without. The model accounted for distinct patterns of neuronal activity in parallel striatal-dopamine circuits and their proposed roles in flexible learning and stable habit formation. Lastly, we examined two distinct algorithms of distributional RL with and without value-decay. These algorithms explained how distinct dopamine patterns across striatal regions relate to the reported differences in the strength of distributional coding. These results suggest that within-striatum differences-specifically, a medial-to-lateral gradient in value or synaptic decay-tune regional RL computations by generating distinct patterns of dopamine/RPE signals. Dopamine had been considered to universally represent reward-prediction-error for simple reinforcement learning (RL). However, recent studies revealed that dopamine in fact exhibits various patterns depending on regions and conditions. Simultaneously, it has been shown that animals' value learning cannot be always described by simple RL but rather described by more sophisticated algorithms, namely, RL under particular state representations, hierarchical RL, and distributional RL. A major remaining question is mechanistically how various patterns of dopamine are generated and how they achieve various RL computations in different regions and conditions. We present a novel coherent answer to this, in which the key is regional difference/gradient in the degree of the decay of dopamine-dependent plastic changes in the cortico-striatal synapses that store values.

摘要

在强化学习(RL)理论中,多巴胺被认为用于编码奖励预测误差(RPE),但也有研究表明,多巴胺会根据区域和条件呈现出不同的模式:一些区域对可预测奖励表现出递增反应,而另一些区域仅对奖励预测线索做出反应。目前尚不清楚这些异质性如何与动物/人类所采用的各种RL算法相关,例如预测状态表示下的RL、分层RL和分布RL。在这里,我们证明通过纳入学习值的衰减(值衰减),可以连贯地解释这些关系,值衰减可通过多巴胺依赖性突触强度可塑性变化的衰减来实现。首先,我们表明值衰减在某些状态表示下会导致递增的RPE,但在其他状态表示下则不会。这解释了在重复奖励导航过程中观察到的多巴胺递增逐渐消失的现象,这归因于预测状态表示的逐渐形成。它还解释了多巴胺的线索类型和试验间隔依赖性时间模式。接下来,我们构建了一个由两个耦合系统组成的分层RL模型,一个系统具有值衰减,另一个系统没有。该模型解释了平行纹状体 - 多巴胺回路中神经元活动的不同模式及其在灵活学习和稳定习惯形成中的作用。最后,我们研究了两种具有和不具有值衰减的分布RL的不同算法。这些算法解释了纹状体区域不同的多巴胺模式如何与报道的分布编码强度差异相关。这些结果表明,纹状体内的差异,特别是值或突触衰减的内侧到外侧梯度,通过产生不同的多巴胺/RPE信号模式来调节区域RL计算。多巴胺一直被认为在简单强化学习(RL)中普遍代表奖励预测误差。然而,最近的研究表明,多巴胺实际上会根据区域和条件呈现出各种模式。同时,研究表明动物的价值学习不能总是用简单的RL来描述,而要用更复杂的算法来描述,即特定状态表示下的RL、分层RL和分布RL。一个主要的遗留问题是,从机制上讲,多巴胺的各种模式是如何产生的,以及它们如何在不同区域和条件下实现各种RL计算。我们对此提出了一个新颖且连贯的答案,其中关键在于存储值的皮质 - 纹状体突触中多巴胺依赖性可塑性变化衰减程度的区域差异/梯度。

相似文献

1
Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.纹状体价值衰减梯度解释了多巴胺模式和强化学习计算中的区域差异。
J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
A multidimensional distributional map of future reward in dopamine neurons.多巴胺神经元中未来奖励的多维分布图。
Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.
4
Short-Term Memory Impairment短期记忆障碍
5
Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.纹状体多巴胺爬坡可能表明皮质基底神经节回路具有灵活的强化学习和遗忘能力。
Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.
6
Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.在一项关于多巴胺在强化学习中作用的正式测试中,区分预测误差和价值。
Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.
7
Nucleus accumbens dopamine encodes the trace period during appetitive Pavlovian conditioning.伏隔核多巴胺在经典性条件反射性食欲训练中编码痕迹期。
bioRxiv. 2025 Apr 3:2025.01.07.631806. doi: 10.1101/2025.01.07.631806.
8
Striatal dopamine represents valence on dynamic regional scales.纹状体多巴胺在动态区域尺度上代表效价。
J Neurosci. 2025 Mar 17;45(17). doi: 10.1523/JNEUROSCI.1551-24.2025.
9
Multi-timescale reinforcement learning in the brain.大脑中的多时间尺度强化学习。
Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.
10
Natural behaviour is learned through dopamine-mediated reinforcement.自然行为是通过多巴胺介导的强化作用习得的。
Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.

本文引用的文献

1
Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.通过生物启发式强化学习模型,紧张性多巴胺与价值学习中的偏差相联系。
Nat Commun. 2025 Aug 13;16(1):7529. doi: 10.1038/s41467-025-62280-1.
2
Multi-timescale reinforcement learning in the brain.大脑中的多时间尺度强化学习。
Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.
3
Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。
Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.
4
Dopamine in the tail of the striatum facilitates avoidance in threat-reward conflicts.纹状体尾部的多巴胺在威胁-奖赏冲突中促进回避行为。
Nat Neurosci. 2025 Apr;28(4):795-810. doi: 10.1038/s41593-025-01902-9. Epub 2025 Mar 10.
5
An opponent striatal circuit for distributional reinforcement learning.用于分布式强化学习的对侧纹状体回路。
Nature. 2025 Mar;639(8055):717-726. doi: 10.1038/s41586-024-08488-5. Epub 2025 Feb 19.
6
Reinforcement learning when your life depends on it: A neuro-economic theory of learning.性命攸关时的强化学习:学习的神经经济学理论。
PLoS Comput Biol. 2024 Oct 28;20(10):e1012554. doi: 10.1371/journal.pcbi.1012554. eCollection 2024 Oct.
7
Shifting attention to orient or avoid: a unifying account of the tail of the striatum and its dopaminergic inputs.转移注意力以定向或回避:纹状体尾部及其多巴胺能输入的统一解释。
Curr Opin Behav Sci. 2024 Oct;59. doi: 10.1016/j.cobeha.2024.101441. Epub 2024 Sep 2.
8
A feature-specific prediction error model explains dopaminergic heterogeneity.一种具有特征特异性的预测误差模型解释了多巴胺能异质性。
Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.
9
Active forgetting and neuropsychiatric diseases.主动遗忘与神经精神疾病。
Mol Psychiatry. 2024 Sep;29(9):2810-2820. doi: 10.1038/s41380-024-02521-9. Epub 2024 Mar 26.
10
Dopamine transients follow a striatal gradient of reward time horizons.多巴胺瞬变遵循纹状体奖赏时程的梯度。
Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.