• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习表达类似于奖励预测误差的多巴胺能活动需要时间的可塑性表示。

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time.

机构信息

Department of Bioengineering, Imperial College London, London, UK.

Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA.

出版信息

Nat Commun. 2024 Jul 12;15(1):5856. doi: 10.1038/s41467-024-50205-3.

DOI:10.1038/s41467-024-50205-3
PMID:38997276
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11245539/
Abstract

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.

摘要

解释学习在大脑中的主要理论框架是时间差分学习(TD)学习,通过这种学习,某些单元会发出奖励预测误差(RPE)信号。传统上,TD 算法被映射到多巴胺能系统,因为多巴胺神经元的发射特性可以类似于 RPE。然而,TD 学习的某些预测与实验结果不一致,并且该算法的先前实现对刺激特异性固定时间基础做出了不可扩展的假设。我们提出了一个替代框架来描述大脑中的多巴胺信号,即 FLEX(灵活学习的预期奖励中的误差)。在 FLEX 中,多巴胺释放类似于但不完全等同于 RPE,这导致了与 TD 预测相反的预测。虽然 FLEX 本身是一个通用的理论框架,但我们描述了一个具体的、生物物理上合理的实现,其结果与大量现有的和重新分析的实验数据一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/995bcd74621d/41467_2024_50205_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/bdf90017b1e6/41467_2024_50205_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/545744fa4d07/41467_2024_50205_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/2a4e7a705811/41467_2024_50205_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/92eabc8f3039/41467_2024_50205_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/75c911691e5b/41467_2024_50205_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/92877150e884/41467_2024_50205_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/94e1b112819a/41467_2024_50205_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/995bcd74621d/41467_2024_50205_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/bdf90017b1e6/41467_2024_50205_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/545744fa4d07/41467_2024_50205_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/2a4e7a705811/41467_2024_50205_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/92eabc8f3039/41467_2024_50205_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/75c911691e5b/41467_2024_50205_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/92877150e884/41467_2024_50205_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/94e1b112819a/41467_2024_50205_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/693a/11245539/995bcd74621d/41467_2024_50205_Fig8_HTML.jpg

相似文献

1
Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time.学习表达类似于奖励预测误差的多巴胺能活动需要时间的可塑性表示。
Nat Commun. 2024 Jul 12;15(1):5856. doi: 10.1038/s41467-024-50205-3.
2
Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time.学习表达类似奖励预测误差的多巴胺能活动需要时间的可塑性表征。
Res Sq. 2023 Sep 19:rs.3.rs-3289985. doi: 10.21203/rs.3.rs-3289985/v1.
3
An imperfect dopaminergic error signal can drive temporal-difference learning.不完美的多巴胺能误差信号可以驱动时间差分学习。
PLoS Comput Biol. 2011 May;7(5):e1001133. doi: 10.1371/journal.pcbi.1001133. Epub 2011 May 12.
4
A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.皮质-基底神经节通路的双重作用假说:多巴胺和腺苷介导的对立和时间差分。
Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018.
5
Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system.中脑多巴胺能系统中神经元对时间差分学习算法的实现。
Proc Natl Acad Sci U S A. 2023 Nov 7;120(45):e2309015120. doi: 10.1073/pnas.2309015120. Epub 2023 Oct 30.
6
Rethinking dopamine as generalized prediction error.重新思考多巴胺作为一般性预测误差。
Proc Biol Sci. 2018 Nov 21;285(1891):20181645. doi: 10.1098/rspb.2018.1645.
7
Learning with reinforcement prediction errors in a model of the Drosophila mushroom body.在果蝇蘑菇体模型中进行基于强化预测误差的学习。
Nat Commun. 2021 May 7;12(1):2569. doi: 10.1038/s41467-021-22592-4.
8
A dopamine mechanism for reward maximization.多巴胺奖赏最大化机制。
Proc Natl Acad Sci U S A. 2024 May 14;121(20):e2316658121. doi: 10.1073/pnas.2316658121. Epub 2024 May 8.
9
Dopamine reward prediction errors reflect hidden-state inference across time.多巴胺奖励预测误差反映了跨时间的隐藏状态推理。
Nat Neurosci. 2017 Apr;20(4):581-589. doi: 10.1038/nn.4520. Epub 2017 Mar 6.
10
Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework.中脑多巴胺神经元在一个通用框架中计算推断和缓存的价值预测误差。
Elife. 2016 Mar 7;5:e13665. doi: 10.7554/eLife.13665.

引用本文的文献

1
Negative affect-driven impulsivity as hierarchical model-based overgeneralization.基于层次模型的过度泛化的消极情感驱动冲动性。
Trends Cogn Sci. 2025 May;29(5):407-420. doi: 10.1016/j.tics.2025.01.002. Epub 2025 Feb 6.
2
Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations.背侧纹状体中的多巴胺释放平台和结果信号与经典的强化学习公式形成对比。
Nat Commun. 2024 Oct 14;15(1):8856. doi: 10.1038/s41467-024-53176-7.

本文引用的文献

1
Emergence of belief-like representations through reinforcement learning.通过强化学习产生类信仰的表示。
PLoS Comput Biol. 2023 Sep 11;19(9):e1011067. doi: 10.1371/journal.pcbi.1011067. eCollection 2023 Sep.
2
Toward reproducible models of sequence learning: replication and analysis of a modular spiking network with reward-based learning.迈向可重复的序列学习模型:具有基于奖励学习的模块化脉冲神经网络的复制与分析
Front Integr Neurosci. 2023 Jun 15;17:935177. doi: 10.3389/fnint.2023.935177. eCollection 2023.
3
Mesolimbic dopamine adapts the rate of learning from action.
中脑边缘多巴胺适应动作学习的速度。
Nature. 2023 Feb;614(7947):294-302. doi: 10.1038/s41586-022-05614-z. Epub 2023 Jan 18.
4
Mesolimbic dopamine release conveys causal associations.中脑边缘多巴胺释放传递因果关系。
Science. 2022 Dec 23;378(6626):eabq6740. doi: 10.1126/science.abq6740.
5
A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning.多巴胺反应的逐渐时间转移反映了机器学习中时间差分误差的进展。
Nat Neurosci. 2022 Aug;25(8):1082-1092. doi: 10.1038/s41593-022-01109-2. Epub 2022 Jul 7.
6
Norepinephrine potentiates and serotonin depresses visual cortical responses by transforming eligibility traces.去甲肾上腺素通过改变资格痕迹增强视觉皮层反应,而血清素则抑制视觉皮层反应。
Nat Commun. 2022 Jun 9;13(1):3202. doi: 10.1038/s41467-022-30827-1.
7
Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.皮层对 NAc 的输入比丘脑的输入更具有选择选择性,从而支持强化学习。
Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.
8
How do real animals account for the passage of time during associative learning?真实动物在联想学习过程中如何感知时间的流逝?
Behav Neurosci. 2022 Oct;136(5):383-391. doi: 10.1037/bne0000516. Epub 2022 Apr 28.
9
The role of state uncertainty in the dynamics of dopamine.国家不确定性在多巴胺动态中的作用。
Curr Biol. 2022 Mar 14;32(5):1077-1087.e9. doi: 10.1016/j.cub.2022.01.025. Epub 2022 Feb 2.
10
The learning of prospective and retrospective cognitive maps within neural circuits.在神经回路中学习前瞻性和回溯性认知图。
Neuron. 2021 Nov 17;109(22):3552-3575. doi: 10.1016/j.neuron.2021.09.034. Epub 2021 Oct 21.