多巴胺信号作为时间差异误差：最新进展。

Dopamine signals as temporal difference errors: recent advances.

机构信息

Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.

出版信息

Curr Opin Neurobiol. 2021 Apr;67:95-105. doi: 10.1016/j.conb.2020.08.014. Epub 2020 Nov 10.

DOI:10.1016/j.conb.2020.08.014

PMID:33186815

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8107188/

Abstract

In the brain, dopamine is thought to drive reward-based learning by signaling temporal difference reward prediction errors (TD errors), a 'teaching signal' used to train computers. Recent studies using optogenetic manipulations have provided multiple pieces of evidence supporting that phasic dopamine signals function as TD errors. Furthermore, novel experimental results have indicated that when the current state of the environment is uncertain, dopamine neurons compute TD errors using 'belief states' or a probability distribution over potential states. It remains unclear how belief states are computed but emerging evidence suggests involvement of the prefrontal cortex and the hippocampus. These results refine our understanding of the role of dopamine in learning and the algorithms by which dopamine functions in the brain.

摘要

在大脑中，多巴胺被认为通过信号传递时间差分奖励预测误差（TD 误差）来驱动基于奖励的学习，这是一种用于训练计算机的“教学信号”。最近使用光遗传学操作的研究提供了多项证据，支持了多巴胺的相位信号作为 TD 误差的功能。此外，新的实验结果表明，当环境的当前状态不确定时，多巴胺神经元使用“信念状态”或潜在状态的概率分布来计算 TD 误差。目前尚不清楚如何计算信念状态，但新出现的证据表明涉及前额叶皮层和海马体。这些结果完善了我们对多巴胺在学习中的作用以及多巴胺在大脑中发挥作用的算法的理解。

相似文献

Dopamine signals as temporal difference errors: recent advances.

Curr Opin Neurobiol. 2021 Apr;67:95-105. doi: 10.1016/j.conb.2020.08.014. Epub 2020 Nov 10.

Dopamine reward prediction errors reflect hidden-state inference across time.

Nat Neurosci. 2017 Apr;20(4):581-589. doi: 10.1038/nn.4520. Epub 2017 Mar 6.

Abnormal temporal difference reward-learning signals in major depression.

Brain. 2008 Aug;131(Pt 8):2084-93. doi: 10.1093/brain/awn136. Epub 2008 Jun 25.

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner.

Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework.

Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.

Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.

J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.

Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation.

Cell Rep. 2022 Oct 11;41(2):111470. doi: 10.1016/j.celrep.2022.111470.

A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning.

Nat Neurosci. 2022 Aug;25(8):1082-1092. doi: 10.1038/s41593-022-01109-2. Epub 2022 Jul 7.

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time.

Nat Commun. 2024 Jul 12;15(1):5856. doi: 10.1038/s41467-024-50205-3.

Rethinking dopamine as generalized prediction error.

Proc Biol Sci. 2018 Nov 21;285(1891):20181645. doi: 10.1098/rspb.2018.1645.

引用本文的文献

Modern Day High: The Neurocognitive Impact of Social Media Usage.

Cureus. 2025 Jul 8;17(7):e87496. doi: 10.7759/cureus.87496. eCollection 2025 Jul.

Blinking indexes dynamic attending during and after music listening.

Sci Rep. 2025 Jul 26;15(1):27262. doi: 10.1038/s41598-025-12200-6.

Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning.

PLoS Comput Biol. 2025 Jul 2;21(7):e1013226. doi: 10.1371/journal.pcbi.1013226. eCollection 2025 Jul.

Reaching vigor tracks learned prediction error.

bioRxiv. 2025 Mar 25:2025.03.24.645035. doi: 10.1101/2025.03.24.645035.

Prospective contingency explains behavior and dopamine signals during associative learning.

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning.

PLoS Comput Biol. 2025 Mar 17;21(3):e1012899. doi: 10.1371/journal.pcbi.1012899. eCollection 2025 Mar.

CONSTRUCTING BIOLOGICALLY CONSTRAINED RNNS VIA DALE'S BACKPROP AND TOPOLOGICALLY-INFORMED PRUNING.

bioRxiv. 2025 Jan 13:2025.01.09.632231. doi: 10.1101/2025.01.09.632231.

The curious case of dopaminergic prediction errors and learning associative information beyond value.

Nat Rev Neurosci. 2025 Mar;26(3):169-178. doi: 10.1038/s41583-024-00898-8. Epub 2025 Jan 8.

Independent operations of appetitive and aversive conditioning systems lead to simultaneous production of conflicting memories in an insect.

Proc Biol Sci. 2024 Sep;291(2031):20241273. doi: 10.1098/rspb.2024.1273. Epub 2024 Sep 25.

Explaining dopamine through prediction errors and beyond.

Nat Neurosci. 2024 Sep;27(9):1645-1655. doi: 10.1038/s41593-024-01705-4. Epub 2024 Jul 25.

本文引用的文献

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.

Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.

Prefrontal Cortex Predicts State Switches during Reversal Learning.

Neuron. 2020 Jun 17;106(6):1044-1054.e4. doi: 10.1016/j.neuron.2020.03.024. Epub 2020 Apr 20.

Dopamine D2 receptors in discrimination learning and spine enlargement.

Nature. 2020 Mar;579(7800):555-560. doi: 10.1038/s41586-020-2115-1. Epub 2020 Mar 18.

Inference-Based Decisions in a Hidden State Foraging Task: Differential Contributions of Prefrontal Cortical Areas.

Neuron. 2020 Apr 8;106(1):166-176.e6. doi: 10.1016/j.neuron.2020.01.017. Epub 2020 Feb 11.

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors.

Nat Neurosci. 2020 Feb;23(2):176-178. doi: 10.1038/s41593-019-0574-1. Epub 2020 Jan 20.

Dopaminergic and Prefrontal Basis of Learning from Sensory Confidence and Reward Value.

Neuron. 2020 Feb 19;105(4):700-711.e6. doi: 10.1016/j.neuron.2019.11.018. Epub 2019 Dec 16.

Decreases in Cued Reward Seeking After Reward-Paired Inhibition of Mesolimbic Dopamine.

Neuroscience. 2019 Aug 1;412:259-269. doi: 10.1016/j.neuroscience.2019.04.035. Epub 2019 Apr 25.

Hippocampal pattern separation supports reinforcement learning.

Nat Commun. 2019 Mar 6;10(1):1073. doi: 10.1038/s41467-019-08998-1.

Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions.

Curr Biol. 2019 Jan 7;29(1):93-103.e3. doi: 10.1016/j.cub.2018.11.050. Epub 2018 Dec 20.

Brief, But Not Prolonged, Pauses in the Firing of Midbrain Dopamine Neurons Are Sufficient to Produce a Conditioned Inhibitor.

J Neurosci. 2018 Oct 10;38(41):8822-8830. doi: 10.1523/JNEUROSCI.0144-18.2018. Epub 2018 Sep 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多巴胺信号作为时间差异误差：最新进展。

Dopamine signals as temporal difference errors: recent advances.

机构信息

Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.

出版信息

Curr Opin Neurobiol. 2021 Apr;67:95-105. doi: 10.1016/j.conb.2020.08.014. Epub 2020 Nov 10.

DOI:10.1016/j.conb.2020.08.014

PMID:33186815

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8107188/

Abstract

摘要

多巴胺信号作为时间差异误差：最新进展。

Dopamine signals as temporal difference errors: recent advances.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

多巴胺信号作为时间差异误差：最新进展。

Dopamine signals as temporal difference errors: recent advances.

机构信息

出版信息