• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

中脑边缘多巴胺适应动作学习的速度。

Mesolimbic dopamine adapts the rate of learning from action.

机构信息

Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA.

出版信息

Nature. 2023 Feb;614(7947):294-302. doi: 10.1038/s41586-022-05614-z. Epub 2023 Jan 18.

DOI:10.1038/s41586-022-05614-z
PMID:36653450
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9908546/
Abstract

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions. Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction; however, so far there has been little consideration of how direct policy learning might inform our understanding. Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning.

摘要

最近,人工智能代理和机器人的训练取得了成功,这得益于行为策略的直接学习和通过价值函数的间接学习的结合。策略学习和价值学习使用不同的算法,分别优化行为表现和奖励预测。在动物中,行为学习和中脑边缘多巴胺信号的作用已经在很大程度上针对奖励预测进行了评估;然而,到目前为止,对于直接策略学习如何为我们的理解提供信息,还没有太多的考虑。在这里,我们使用了一个全面的口腔和身体运动数据集,来了解在无经验、头部受限的老鼠学习痕迹条件反射范式时,行为策略是如何演变的。初始多巴胺奖赏反应的个体差异与习得的行为策略的出现相关,但与预测线索的价值编码的出现无关。同样,中脑边缘多巴胺的生理校准操纵产生了几种与价值学习不一致的效果,但被基于神经网络的模型所预测,该模型使用多巴胺信号为行为策略学习设置自适应率,而不是错误信号。这项工作为多巴胺活动可以调节行为策略的直接学习提供了有力的证据,扩展了强化学习模型对动物学习的解释能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/7e1052f6858f/41586_2022_5614_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/c26394a75d78/41586_2022_5614_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/3f980febc6cd/41586_2022_5614_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/3b412dbe6062/41586_2022_5614_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/234d36a87811/41586_2022_5614_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/6f946e51527f/41586_2022_5614_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/f6f9e70c7cd9/41586_2022_5614_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/b72776754952/41586_2022_5614_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/77bfeec920f4/41586_2022_5614_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/ba6ce0672ecc/41586_2022_5614_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/dcb1803306c5/41586_2022_5614_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/57cee277b5d6/41586_2022_5614_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/74b978a42cc9/41586_2022_5614_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/4da963f529c2/41586_2022_5614_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/900b981fefe5/41586_2022_5614_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/315883670e8d/41586_2022_5614_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/7e1052f6858f/41586_2022_5614_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/c26394a75d78/41586_2022_5614_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/3f980febc6cd/41586_2022_5614_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/3b412dbe6062/41586_2022_5614_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/234d36a87811/41586_2022_5614_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/6f946e51527f/41586_2022_5614_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/f6f9e70c7cd9/41586_2022_5614_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/b72776754952/41586_2022_5614_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/77bfeec920f4/41586_2022_5614_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/ba6ce0672ecc/41586_2022_5614_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/dcb1803306c5/41586_2022_5614_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/57cee277b5d6/41586_2022_5614_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/74b978a42cc9/41586_2022_5614_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/4da963f529c2/41586_2022_5614_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/900b981fefe5/41586_2022_5614_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/315883670e8d/41586_2022_5614_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86f5/9908546/7e1052f6858f/41586_2022_5614_Fig16_ESM.jpg

相似文献

1
Mesolimbic dopamine adapts the rate of learning from action.中脑边缘多巴胺适应动作学习的速度。
Nature. 2023 Feb;614(7947):294-302. doi: 10.1038/s41586-022-05614-z. Epub 2023 Jan 18.
2
Dissociable contributions of phasic dopamine activity to reward and prediction.相位多巴胺活动对奖励和预测的可分离贡献。
Cell Rep. 2021 Sep 7;36(10):109684. doi: 10.1016/j.celrep.2021.109684.
3
Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner.多巴胺错误以特定于结果的方式驱动反向条件作用的兴奋性和抑制性成分。
Curr Biol. 2022 Jul 25;32(14):3210-3218.e3. doi: 10.1016/j.cub.2022.06.035. Epub 2022 Jun 24.
4
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
5
Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations.背侧纹状体中的多巴胺释放平台和结果信号与经典的强化学习公式形成对比。
Nat Commun. 2024 Oct 14;15(1):8856. doi: 10.1038/s41467-024-53176-7.
6
Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.概率性巴甫洛夫条件反射过程中多巴胺信号的动态塑造
Neurobiol Learn Mem. 2015 Jan;117:84-92. doi: 10.1016/j.nlm.2014.07.010. Epub 2014 Aug 27.
7
The emergence of saliency and novelty responses from Reinforcement Learning principles.基于强化学习原理的显著性和新颖性反应的出现。
Neural Netw. 2008 Dec;21(10):1493-9. doi: 10.1016/j.neunet.2008.09.004. Epub 2008 Sep 25.
8
Spontaneous behaviour is structured by reinforcement without explicit reward.自发行为是由强化而不是明确的奖励来结构化的。
Nature. 2023 Feb;614(7946):108-117. doi: 10.1038/s41586-022-05611-2. Epub 2023 Jan 18.
9
Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration.多巴胺对新颖性的编码促进了高效的不确定性驱动探索。
PLoS Comput Biol. 2024 Apr 16;20(4):e1011516. doi: 10.1371/journal.pcbi.1011516. eCollection 2024 Apr.
10
Central oxytocin signaling inhibits food reward-motivated behaviors and VTA dopamine responses to food-predictive cues in male rats.中枢催产素信号抑制雄性大鼠的食物奖赏动机行为和 VTA 对食物预测线索的多巴胺反应。
Horm Behav. 2020 Nov;126:104855. doi: 10.1016/j.yhbeh.2020.104855. Epub 2020 Oct 1.

引用本文的文献

1
Sensitive dLight3 for imaging broad-spectrum dopamine events across brain regions.用于跨脑区成像广谱多巴胺事件的灵敏dLight3。
Res Sq. 2025 Aug 20:rs.3.rs-7313638. doi: 10.21203/rs.3.rs-7313638/v1.
2
Mesolimbic dopamine ramps reflect environmental timescales.中脑边缘多巴胺信号增强反映环境时间尺度。
Elife. 2025 Aug 29;13:RP98666. doi: 10.7554/eLife.98666.
3
Individual differences in decision-making shape how mesolimbic dopamine regulates choice confidence and change-of-mind.决策过程中的个体差异塑造了中脑边缘多巴胺调节选择信心和改变想法的方式。
Nat Neurosci. 2025 Jul 30. doi: 10.1038/s41593-025-02015-z.
4
Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets.大型纵向功能数据集的快速惩罚广义估计方程
ArXiv. 2025 Jun 25:arXiv:2506.20437v1.
5
Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning.伏隔核多巴胺释放反映了工具性学习过程中的贝叶斯推理。
PLoS Comput Biol. 2025 Jul 2;21(7):e1013226. doi: 10.1371/journal.pcbi.1013226. eCollection 2025 Jul.
6
Striatal Dopamine Actions and Movement: Inferences from Parkinson Disease.纹状体多巴胺的作用与运动:来自帕金森病的推断
J Neurosci. 2025 Jun 11;45(24):e0022252025. doi: 10.1523/JNEUROSCI.0022-25.2025.
7
Dopaminergic action prediction errors serve as a value-free teaching signal.多巴胺能动作预测误差作为一种无价值的教学信号。
Nature. 2025 May 14. doi: 10.1038/s41586-025-09008-9.
8
From avoidance to new action: the multifaceted role of the striatal indirect pathway.从回避到新行动:纹状体间接通路的多方面作用。
Nat Rev Neurosci. 2025 May 7. doi: 10.1038/s41583-025-00925-2.
9
Functional Diversity of Serotonin Neurons in the Dorsal and Median Raphe Nuclei in Emotional Responses.中缝背核和中缝正中核中5-羟色胺能神经元在情绪反应中的功能多样性
Neuropsychopharmacol Rep. 2025 Jun;45(2):e70015. doi: 10.1002/npr2.70015.
10
Hedonic eating is controlled by dopamine neurons that oppose GLP-1R satiety.享乐性进食由与胰高血糖素样肽-1受体饱腹感作用相对抗的多巴胺能神经元控制。
Science. 2025 Mar 28;387(6741):eadt0773. doi: 10.1126/science.adt0773.