多巴胺反应的逐渐时间转移反映了机器学习中时间差分误差的进展。

A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning.

机构信息

Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.

Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan.

出版信息

Nat Neurosci. 2022 Aug;25(8):1082-1092. doi: 10.1038/s41593-022-01109-2. Epub 2022 Jul 7.

DOI:10.1038/s41593-022-01109-2

PMID:35798979

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9624460/

Abstract

A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.

摘要

大量证据表明，中脑多巴胺神经元的相位反应与机器学习中使用的一种教学信号（时间差分 (TD) 误差）非常相似。然而，之前的研究未能观察到该算法的一个关键预测：当一个代理将提示和奖励关联起来，而奖励和提示在时间上是分开的，那么多巴胺信号的时间应该在多个试验中从奖励时间逐渐向后移动到提示时间。在这里，我们证明了这种逐渐的转变既发生在小鼠腹侧纹状体的多巴胺能细胞活动水平上，也发生在多巴胺释放水平上。我们的结果在多巴胺能活动和 TD 学习算法之间建立了长期以来寻求的联系，为大脑如何将时间上分开的提示和奖励联系起来提供了基本的见解。

相似文献

A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning.多巴胺反应的逐渐时间转移反映了机器学习中时间差分误差的进展。

Nat Neurosci. 2022 Aug;25(8):1082-1092. doi: 10.1038/s41593-022-01109-2. Epub 2022 Jul 7.

The timing of action determines reward prediction signals in identified midbrain dopamine neurons.动作的时机决定了中脑多巴胺神经元中奖励预测信号的时间。

Nat Neurosci. 2018 Nov;21(11):1563-1573. doi: 10.1038/s41593-018-0245-7. Epub 2018 Oct 15.

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner.多巴胺错误以特定于结果的方式驱动反向条件作用的兴奋性和抑制性成分。

Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.

Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.对前额叶皮层多巴胺能投射的强直或相位刺激使小鼠维持或偏离先前习得的行为策略。

J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.

Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation.中脑多巴胺神经元在目标导向导航过程中信号传递相位和斜率奖励预测误差。

Cell Rep. 2022 Oct 11;41(2):111470. doi: 10.1016/j.celrep.2022.111470.

Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum.多巴胺调节人类中脑和纹状体中的适应性预测误差编码。

J Neurosci. 2017 Feb 15;37(7):1708-1720. doi: 10.1523/JNEUROSCI.1979-16.2016.

Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations.背侧纹状体中的多巴胺释放平台和结果信号与经典的强化学习公式形成对比。

Nat Commun. 2024 Oct 14;15(1):8856. doi: 10.1038/s41467-024-53176-7.

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework.中脑多巴胺神经元在一个通用框架中计算推断和缓存的价值预测误差。

Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.

The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons.获得奖励的成本增强了中脑多巴胺神经元的奖励预测误差信号。

Nat Commun. 2019 Aug 15;10(1):3674. doi: 10.1038/s41467-019-11334-2.

A causal link between prediction errors, dopamine neurons and learning.预测误差、多巴胺神经元和学习之间的因果关系。

Nat Neurosci. 2013 Jul;16(7):966-73. doi: 10.1038/nn.3413. Epub 2013 May 26.

引用本文的文献

Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.通过生物启发式强化学习模型，紧张性多巴胺与价值学习中的偏差相联系。

Nat Commun. 2025 Aug 13;16(1):7529. doi: 10.1038/s41467-025-62280-1.

Individual differences in decision-making shape how mesolimbic dopamine regulates choice confidence and change-of-mind.决策过程中的个体差异塑造了中脑边缘多巴胺调节选择信心和改变想法的方式。

Nat Neurosci. 2025 Jul 30. doi: 10.1038/s41593-025-02015-z.

Trial-by-trial learning of successor representations in human behavior.人类行为中后继表征的逐次试验学习。

bioRxiv. 2025 Jun 16:2024.11.07.622528. doi: 10.1101/2024.11.07.622528.

Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning.攀爬纤维输入的预测性奖励预测误差将模块化强化学习与监督学习相结合。

PLoS Comput Biol. 2025 Mar 17;21(3):e1012899. doi: 10.1371/journal.pcbi.1012899. eCollection 2025 Mar.

Interpretable deep learning for deconvolutional analysis of neural signals.用于神经信号反卷积分析的可解释深度学习

Neuron. 2025 Apr 16;113(8):1151-1168.e13. doi: 10.1016/j.neuron.2025.02.006. Epub 2025 Mar 12.

A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments.用于分析光纤光度测量实验中试验水平时间动态的统计框架。

Elife. 2025 Mar 12;13:RP95802. doi: 10.7554/eLife.95802.

Contextual cues facilitate dynamic value encoding in the mesolimbic dopamine system.情境线索有助于中脑边缘多巴胺系统中的动态价值编码。

Curr Biol. 2025 Feb 24;35(4):746-760.e5. doi: 10.1016/j.cub.2024.12.031. Epub 2025 Jan 23.

Distributed representations of temporally accumulated reward prediction errors in the mouse cortex.小鼠皮层中时间累积奖励预测误差的分布式表征。

Sci Adv. 2025 Jan 24;11(4):eadi4782. doi: 10.1126/sciadv.adi4782. Epub 2025 Jan 22.

"PyTDL": A versatile temporal difference learning algorithm to simulate behavior process of decision making and cognitive learning.“PyTDL”：一种通用的时间差分学习算法，用于模拟决策和认知学习的行为过程。

iScience. 2024 Dec 14;28(1):111600. doi: 10.1016/j.isci.2024.111600. eCollection 2025 Jan 17.

本文引用的文献

Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task.在一个决策任务中，纹状体三个区域的多巴胺轴突中存在明显的时间差异错误信号。

Elife. 2020 Dec 21;9:e62390. doi: 10.7554/eLife.62390.

A Unified Framework for Dopamine Signals across Timescales.多巴胺信号的跨时间尺度统一框架。

Cell. 2020 Dec 10;183(6):1600-1616.e25. doi: 10.1016/j.cell.2020.11.013. Epub 2020 Nov 27.

Next-generation GRAB sensors for monitoring dopaminergic activity in vivo.用于监测体内多巴胺能活动的新一代 GRAB 传感器。

Nat Methods. 2020 Nov;17(11):1156-1166. doi: 10.1038/s41592-020-00981-9. Epub 2020 Oct 21.

Deep Reinforcement Learning and Its Neuroscientific Implications.深度强化学习及其神经科学意义。

Neuron. 2020 Aug 19;107(4):603-616. doi: 10.1016/j.neuron.2020.06.014. Epub 2020 Jul 13.

A systems-neuroscience model of phasic dopamine.相位多巴胺的系统神经科学模型。

Psychol Rev. 2020 Nov;127(6):972-1021. doi: 10.1037/rev0000199. Epub 2020 Jun 11.

Cue-Evoked Dopamine Promotes Conditioned Responding during Learning.线索诱发的多巴胺促进学习过程中的条件反应。

Neuron. 2020 Apr 8;106(1):142-153.e7. doi: 10.1016/j.neuron.2020.01.012. Epub 2020 Feb 5.

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors.支持多巴胺瞬时作为时间差分预测误差的因果证据。

Nat Neurosci. 2020 Feb;23(2):176-178. doi: 10.1038/s41593-019-0574-1. Epub 2020 Jan 20.

A distributional code for value in dopamine-based reinforcement learning.多巴胺基强化学习中的价值分布代码。

Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

High-performance calcium sensors for imaging activity in neuronal populations and microcompartments.用于在神经元群体和微区中成像活性的高性能钙传感器。

Nat Methods. 2019 Jul;16(7):649-657. doi: 10.1038/s41592-019-0435-6. Epub 2019 Jun 17.

Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons.腹侧被盖区多巴胺神经元中感觉、运动和认知变量的特异性编码。

Nature. 2019 Jun;570(7762):509-513. doi: 10.1038/s41586-019-1261-9. Epub 2019 May 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验