多巴胺瞬变独立于学习率编码奖励预测误差。

Dopamine transients encode reward prediction errors independent of learning rates.

机构信息

Center for Neural Science, New York University, New York, NY, USA.

出版信息

Cell Rep. 2024 Oct 22;43(10):114840. doi: 10.1016/j.celrep.2024.114840. Epub 2024 Oct 11.

DOI:10.1016/j.celrep.2024.114840

PMID:39395170

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11571066/

Abstract

Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented by corticostriatal synaptic weights, which are updated by dopamine-dependent plasticity. This suggests that dopamine release reflects the product of the learning rate and RPE. Here, we characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc) in a volatile environment. Using a task with semi-observable states offering different rewards, we find that rats adjust how quickly they initiate trials across states using RPEs. Computational modeling and behavioral analyses show that learning rates are higher following state transitions and scale with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encodes RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.

摘要

强化学习的生物学解释假设多巴胺编码奖励预测误差（RPE），RPE 乘以学习率来更新状态或动作值。这些值被认为是由皮质纹状体突触权重表示的，而这些权重是由多巴胺依赖性可塑性更新的。这表明多巴胺释放反映了学习率和 RPE 的产物。在这里，我们在易变的环境中描述了伏隔核核心（NAcc）中学习率的多巴胺编码。使用具有提供不同奖励的半可观察状态的任务，我们发现大鼠使用 RPE 在状态之间调整启动试验的速度。计算模型和行为分析表明，在状态转换后学习率更高，并且与对隐藏状态的信念的逐试变化成正比，接近规范贝叶斯策略。值得注意的是，NAcc 中的多巴胺释放独立于学习率编码 RPE，这表明多巴胺独立的机制实现了动态学习率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ef6/11571066/50156759e793/nihms-2031357-f0002.jpg

相似文献

Dopamine transients encode reward prediction errors independent of learning rates.多巴胺瞬变独立于学习率编码奖励预测误差。

Cell Rep. 2024 Oct 22;43(10):114840. doi: 10.1016/j.celrep.2024.114840. Epub 2024 Oct 11.

Dopamine transients encode reward prediction errors independent of learning rates.多巴胺瞬变编码奖励预测误差，与学习率无关。

bioRxiv. 2024 Aug 19:2024.04.18.590090. doi: 10.1101/2024.04.18.590090.

Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term.大鼠伏隔核中多巴胺的相位释放对称地编码了一个奖励预测误差项。

J Neurosci. 2014 Jan 15;34(3):698-704. doi: 10.1523/JNEUROSCI.2489-13.2014.

Dopamine Release in the Nucleus Accumbens Core Encodes the General Excitatory Components of Learning.伏隔核核心中的多巴胺释放编码了学习的一般兴奋成分。

J Neurosci. 2024 Aug 28;44(35):e0120242024. doi: 10.1523/JNEUROSCI.0120-24.2024.

Fast dopamine release events in the nucleus accumbens of early adolescent rats.早期青春期大鼠伏隔核中快速多巴胺释放事件。

Neuroscience. 2011 Mar 10;176:296-307. doi: 10.1016/j.neuroscience.2010.12.016. Epub 2010 Dec 20.

Estrogenic control of reward prediction errors and reinforcement learning.雌激素对奖励预测误差和强化学习的控制。

bioRxiv. 2024 Sep 25:2023.12.09.570945. doi: 10.1101/2023.12.09.570945.

Dissociable dopamine dynamics for learning and motivation.学习和动机的多巴胺动态可分离。

Nature. 2019 Jun;570(7759):65-70. doi: 10.1038/s41586-019-1235-y. Epub 2019 May 22.

Mesolimbic dopamine signals the value of work.中脑边缘多巴胺传递工作的价值。

Nat Neurosci. 2016 Jan;19(1):117-26. doi: 10.1038/nn.4173. Epub 2015 Nov 23.

Differential Dopamine Release Dynamics in the Nucleus Accumbens Core and Shell Reveal Complementary Signals for Error Prediction and Incentive Motivation.伏隔核核心区与壳区多巴胺释放动力学差异揭示了用于错误预测和激励动机的互补信号。

J Neurosci. 2015 Aug 19;35(33):11572-82. doi: 10.1523/JNEUROSCI.2344-15.2015.

Phasic Dopamine Transmission Reflects Initiation Vigor and Exerted Effort in an Action- and Region-Specific Manner.相位性多巴胺传递以动作和区域特异性方式反映启动活力和付出的努力。

J Neurosci. 2016 Feb 17;36(7):2202-11. doi: 10.1523/JNEUROSCI.1279-15.2016.

引用本文的文献

Uncertainty and reward histories have distinct effects on decisions after wins and losses.不确定性和奖励历史对输赢后的决策有不同影响。

bioRxiv. 2025 Aug 19:2025.08.14.670176. doi: 10.1101/2025.08.14.670176.

The devilish details affecting TDRL models in dopamine research.多巴胺研究中影响临时残疾评定量表（TDRL）模型的棘手细节。

Trends Cogn Sci. 2025 May;29(5):434-447. doi: 10.1016/j.tics.2025.02.001. Epub 2025 Feb 26.

Accumbal acetylcholine signals associative salience.伏隔核乙酰胆碱信号传递联合显著性。

bioRxiv. 2025 Jan 6:2025.01.06.631529. doi: 10.1101/2025.01.06.631529.

Neural dynamics in the orbitofrontal cortex reveal cognitive strategies.眶额皮质中的神经动力学揭示了认知策略。

bioRxiv. 2024 Oct 29:2024.10.29.620879. doi: 10.1101/2024.10.29.620879.

本文引用的文献

Distinct dynamics and intrinsic properties in ventral tegmental area populations mediate reward association and motivation.腹侧被盖区群体中的不同动力学和固有特性介导了奖励关联和动机。

Cell Rep. 2024 Sep 24;43(9):114668. doi: 10.1016/j.celrep.2024.114668. Epub 2024 Aug 27.

State and rate-of-change encoding in parallel mesoaccumbal dopamine pathways.平行中脑边缘多巴胺通路中的状态和变化率编码。

Nat Neurosci. 2024 Feb;27(2):309-318. doi: 10.1038/s41593-023-01547-6. Epub 2024 Jan 11.

Distinct value computations support rapid sequential decisions.不同的值计算支持快速连续决策。

Nat Commun. 2023 Nov 21;14(1):7573. doi: 10.1038/s41467-023-43250-x.

Mesolimbic dopamine adapts the rate of learning from action.中脑边缘多巴胺适应动作学习的速度。

Nature. 2023 Feb;614(7947):294-302. doi: 10.1038/s41586-022-05614-z. Epub 2023 Jan 18.

Correcting motion induced fluorescence artifacts in two-channel neural imaging.校正双通道神经成像中运动诱导的荧光伪影。

PLoS Comput Biol. 2022 Sep 28;18(9):e1010421. doi: 10.1371/journal.pcbi.1010421. eCollection 2022 Sep.

Serotonin receptors contribute to dopamine depression of lateral inhibition in the nucleus accumbens.5-羟色胺受体参与伏隔核中多巴胺对侧抑制的抑制。

Cell Rep. 2022 May 10;39(6):110795. doi: 10.1016/j.celrep.2022.110795.

Coincidence of cholinergic pauses, dopaminergic activation and depolarisation of spiny projection neurons drives synaptic plasticity in the striatum.胆碱能停顿、多巴胺能激活和棘状投射神经元的去极化同时发生，驱动纹状体中的突触可塑性。

Nat Commun. 2022 Mar 11;13(1):1296. doi: 10.1038/s41467-022-28950-0.

Serotonin neurons modulate learning rate through uncertainty.血清素神经元通过不确定性来调节学习率。

Curr Biol. 2022 Feb 7;32(3):586-599.e7. doi: 10.1016/j.cub.2021.12.006. Epub 2021 Dec 21.

The Effect of Serotonin Receptor 5-HT1B on Lateral Inhibition between Spiny Projection Neurons in the Mouse Striatum.5-HT1B 型血清素受体对小鼠纹状体棘状投射神经元间侧抑制的影响。

J Neurosci. 2021 Sep 15;41(37):7831-7847. doi: 10.1523/JNEUROSCI.1037-20.2021. Epub 2021 Aug 4.

A Unified Framework for Dopamine Signals across Timescales.多巴胺信号的跨时间尺度统一框架。

Cell. 2020 Dec 10;183(6):1600-1616.e25. doi: 10.1016/j.cell.2020.11.013. Epub 2020 Nov 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多巴胺瞬变独立于学习率编码奖励预测误差。

Dopamine transients encode reward prediction errors independent of learning rates.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献