纹状体价值衰减梯度解释了多巴胺模式和强化学习计算中的区域差异。

Kato Ayaka, Morita Kenji

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, United States.

Postdoctral Fellowship for Research Abroad, Japan Society for the Promotion of Science, Tokyo 102-0083, Japan.

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Dopamine has been suggested to encode reward-prediction-error (RPE) in reinforcement learning (RL) theory, but also shown to exhibit heterogeneous patterns depending on regions and conditions: some exhibiting ramping response to predictable reward while others only responding to reward-predicting cue. It remains elusive how these heterogeneities relate to various RL algorithms proposed to be employed by animals/humans, such as RL under predictive state representation, hierarchical RL, and distributional RL. Here we demonstrate that these relationships can be coherently explained by incorporating the decay of learned values (value-decay), implementable by the decay of dopamine-dependent plastic changes in the synaptic strengths. First, we show that value-decay causes ramping RPE under certain state representations but not under others. This accounted for the observed gradual fading of dopamine ramping across repeated reward navigation, attributed to the gradual formation of predictive state representations. It also explained the cue-type and inter-trial-interval-dependent temporal patterns of dopamine. Next, we constructed a hierarchical RL model composed of two coupled systems-one with value-decay and one without. The model accounted for distinct patterns of neuronal activity in parallel striatal-dopamine circuits and their proposed roles in flexible learning and stable habit formation. Lastly, we examined two distinct algorithms of distributional RL with and without value-decay. These algorithms explained how distinct dopamine patterns across striatal regions relate to the reported differences in the strength of distributional coding. These results suggest that within-striatum differences-specifically, a medial-to-lateral gradient in value or synaptic decay-tune regional RL computations by generating distinct patterns of dopamine/RPE signals. Dopamine had been considered to universally represent reward-prediction-error for simple reinforcement learning (RL). However, recent studies revealed that dopamine in fact exhibits various patterns depending on regions and conditions. Simultaneously, it has been shown that animals' value learning cannot be always described by simple RL but rather described by more sophisticated algorithms, namely, RL under particular state representations, hierarchical RL, and distributional RL. A major remaining question is mechanistically how various patterns of dopamine are generated and how they achieve various RL computations in different regions and conditions. We present a novel coherent answer to this, in which the key is regional difference/gradient in the degree of the decay of dopamine-dependent plastic changes in the cortico-striatal synapses that store values.

在强化学习（RL）理论中，多巴胺被认为用于编码奖励预测误差（RPE），但也有研究表明，多巴胺会根据区域和条件呈现出不同的模式：一些区域对可预测奖励表现出递增反应，而另一些区域仅对奖励预测线索做出反应。目前尚不清楚这些异质性如何与动物/人类所采用的各种RL算法相关，例如预测状态表示下的RL、分层RL和分布RL。在这里，我们证明通过纳入学习值的衰减（值衰减），可以连贯地解释这些关系，值衰减可通过多巴胺依赖性突触强度可塑性变化的衰减来实现。首先，我们表明值衰减在某些状态表示下会导致递增的RPE，但在其他状态表示下则不会。这解释了在重复奖励导航过程中观察到的多巴胺递增逐渐消失的现象，这归因于预测状态表示的逐渐形成。它还解释了多巴胺的线索类型和试验间隔依赖性时间模式。接下来，我们构建了一个由两个耦合系统组成的分层RL模型，一个系统具有值衰减，另一个系统没有。该模型解释了平行纹状体 - 多巴胺回路中神经元活动的不同模式及其在灵活学习和稳定习惯形成中的作用。最后，我们研究了两种具有和不具有值衰减的分布RL的不同算法。这些算法解释了纹状体区域不同的多巴胺模式如何与报道的分布编码强度差异相关。这些结果表明，纹状体内的差异，特别是值或突触衰减的内侧到外侧梯度，通过产生不同的多巴胺/RPE信号模式来调节区域RL计算。多巴胺一直被认为在简单强化学习（RL）中普遍代表奖励预测误差。然而，最近的研究表明，多巴胺实际上会根据区域和条件呈现出各种模式。同时，研究表明动物的价值学习不能总是用简单的RL来描述，而要用更复杂的算法来描述，即特定状态表示下的RL、分层RL和分布RL。一个主要的遗留问题是，从机制上讲，多巴胺的各种模式是如何产生的，以及它们如何在不同区域和条件下实现各种RL计算。我们对此提出了一个新颖且连贯的答案，其中关键在于存储值的皮质 - 纹状体突触中多巴胺依赖性可塑性变化衰减程度的区域差异/梯度。

相似文献

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Prescription of Controlled Substances: Benefits and Risks

A multidimensional distributional map of future reward in dopamine neurons.

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Short-Term Memory Impairment

Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.

Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

Nucleus accumbens dopamine encodes the trace period during appetitive Pavlovian conditioning.

bioRxiv. 2025 Apr 3:2025.01.07.631806. doi: 10.1101/2025.01.07.631806.

Striatal dopamine represents valence on dynamic regional scales.

J Neurosci. 2025 Mar 17;45(17). doi: 10.1523/JNEUROSCI.1551-24.2025.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Natural behaviour is learned through dopamine-mediated reinforcement.

Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.

本文引用的文献

Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.

Nat Commun. 2025 Aug 13;16(1):7529. doi: 10.1038/s41467-025-62280-1.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Prospective contingency explains behavior and dopamine signals during associative learning.

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Dopamine in the tail of the striatum facilitates avoidance in threat-reward conflicts.

Nat Neurosci. 2025 Apr;28(4):795-810. doi: 10.1038/s41593-025-01902-9. Epub 2025 Mar 10.

An opponent striatal circuit for distributional reinforcement learning.

Nature. 2025 Mar;639(8055):717-726. doi: 10.1038/s41586-024-08488-5. Epub 2025 Feb 19.

Reinforcement learning when your life depends on it: A neuro-economic theory of learning.

PLoS Comput Biol. 2024 Oct 28;20(10):e1012554. doi: 10.1371/journal.pcbi.1012554. eCollection 2024 Oct.

Shifting attention to orient or avoid: a unifying account of the tail of the striatum and its dopaminergic inputs.

Curr Opin Behav Sci. 2024 Oct;59. doi: 10.1016/j.cobeha.2024.101441. Epub 2024 Sep 2.

A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.

Active forgetting and neuropsychiatric diseases.

Mol Psychiatry. 2024 Sep;29(9):2810-2820. doi: 10.1038/s41380-024-02521-9. Epub 2024 Mar 26.

Dopamine transients follow a striatal gradient of reward time horizons.

Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Prescription of Controlled Substances: Benefits and Risks

A multidimensional distributional map of future reward in dopamine neurons.

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Short-Term Memory Impairment

Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.

Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

Nucleus accumbens dopamine encodes the trace period during appetitive Pavlovian conditioning.

bioRxiv. 2025 Apr 3:2025.01.07.631806. doi: 10.1101/2025.01.07.631806.

Striatal dopamine represents valence on dynamic regional scales.

J Neurosci. 2025 Mar 17;45(17). doi: 10.1523/JNEUROSCI.1551-24.2025.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Natural behaviour is learned through dopamine-mediated reinforcement.

Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.

本文引用的文献

Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.

Nat Commun. 2025 Aug 13;16(1):7529. doi: 10.1038/s41467-025-62280-1.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Prospective contingency explains behavior and dopamine signals during associative learning.

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Dopamine in the tail of the striatum facilitates avoidance in threat-reward conflicts.

Nat Neurosci. 2025 Apr;28(4):795-810. doi: 10.1038/s41593-025-01902-9. Epub 2025 Mar 10.

An opponent striatal circuit for distributional reinforcement learning.

Nature. 2025 Mar;639(8055):717-726. doi: 10.1038/s41586-024-08488-5. Epub 2025 Feb 19.

Reinforcement learning when your life depends on it: A neuro-economic theory of learning.

PLoS Comput Biol. 2024 Oct 28;20(10):e1012554. doi: 10.1371/journal.pcbi.1012554. eCollection 2024 Oct.

Shifting attention to orient or avoid: a unifying account of the tail of the striatum and its dopaminergic inputs.

Curr Opin Behav Sci. 2024 Oct;59. doi: 10.1016/j.cobeha.2024.101441. Epub 2024 Sep 2.

A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.

Active forgetting and neuropsychiatric diseases.

Mol Psychiatry. 2024 Sep;29(9):2810-2820. doi: 10.1038/s41380-024-02521-9. Epub 2024 Mar 26.

Dopamine transients follow a striatal gradient of reward time horizons.

Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献