• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模型和无模型的巴甫洛夫奖励学习:重新评估、修正与揭示。

Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation.

作者信息

Dayan Peter, Berridge Kent C

机构信息

Gatsby Computational Neuroscience Unit, University College London, London, UK,

出版信息

Cogn Affect Behav Neurosci. 2014 Jun;14(2):473-92. doi: 10.3758/s13415-014-0277-8.

DOI:10.3758/s13415-014-0277-8
PMID:24647659
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4074442/
Abstract

Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations, and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response, and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation.

摘要

有证据支持至少两种了解奖励和惩罚并做出预测以指导行动的方法。一种方法称为无模型方法,它从回顾性经验中逐步获取情境和行动长期价值的缓存估计。另一种方法称为基于模型的方法,它使用环境表征、期望和前瞻性计算来对未来价值进行认知预测。在工具性学习的计算分析中,这两种方法都受到了广泛关注。相比之下,尽管缺乏全面的计算分析,但经典条件作用学习和预测通常被认为完全是无模型的。在这里,我们修正这一假设,并回顾来自经典条件作用重新评估实验的有力证据,这些证据表明经典条件作用预测可能涉及它们自己形式的基于模型的评估。在基于模型的经典条件作用评估中,身体和大脑的主导状态会影响价值计算,从而产生有时可能相当新颖的强大激励动机。我们考虑这种修正后的经典条件作用观点对预测、反应和选择的计算格局的影响。我们还重新审视了经典条件作用和工具性学习在激励动机控制方面的差异。

相似文献

1
Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation.基于模型和无模型的巴甫洛夫奖励学习:重新评估、修正与揭示。
Cogn Affect Behav Neurosci. 2014 Jun;14(2):473-92. doi: 10.3758/s13415-014-0277-8.
2
Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related.奖赏介导的、无模型的强化学习机制在条件反射和工具性任务中是相关的。
J Neurosci. 2023 Jan 18;43(3):458-471. doi: 10.1523/JNEUROSCI.1113-22.2022. Epub 2022 Oct 10.
3
Differential dependence of Pavlovian incentive motivation and instrumental incentive learning processes on dopamine signaling.巴甫洛夫式激励动机和工具性激励学习过程对多巴胺信号传递的差异依赖性。
Learn Mem. 2011 Jun 21;18(7):475-83. doi: 10.1101/lm.2229311. Print 2011.
4
Controllability governs the balance between Pavlovian and instrumental action selection.可控性支配着巴甫洛夫式和工具性动作选择之间的平衡。
Nat Commun. 2019 Dec 20;10(1):5826. doi: 10.1038/s41467-019-13737-7.
5
Learning and Motivational Processes Contributing to Pavlovian-Instrumental Transfer and Their Neural Bases: Dopamine and Beyond.促成巴甫洛夫式-工具性转移的学习与动机过程及其神经基础:多巴胺及其他因素
Curr Top Behav Neurosci. 2016;27:259-89. doi: 10.1007/7854_2015_388.
6
Age-dependent Pavlovian biases influence motor decision-making.年龄相关的巴甫洛夫式偏见影响运动决策。
PLoS Comput Biol. 2018 Jul 6;14(7):e1006304. doi: 10.1371/journal.pcbi.1006304. eCollection 2018 Jul.
7
Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.在厌恶环境背景下的奖励和回避学习及其对抑郁症状的可能影响。
Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.
8
Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding.在工具性和巴甫洛夫反应中分离趋近、激活和效价的作用。
PLoS Comput Biol. 2011 Apr;7(4):e1002028. doi: 10.1371/journal.pcbi.1002028. Epub 2011 Apr 21.
9
Modeling incentive salience in Pavlovian learning more parsimoniously using a multiple attribute model.更简洁地使用多属性模型对巴甫洛夫学习中的激励显著性进行建模。
Cogn Affect Behav Neurosci. 2022 Apr;22(2):244-257. doi: 10.3758/s13415-021-00953-2. Epub 2021 Oct 21.
10
Single-response appetitive Pavlovian to instrumental transfer is suppressed by aversive counter-conditioning.单反应性食欲型巴甫洛夫式到工具性转换受到厌恶反条件作用的抑制。
Q J Exp Psychol (Hove). 2019 Dec;72(12):2820-2832. doi: 10.1177/1747021819862996. Epub 2019 Jul 25.

引用本文的文献

1
Selective engagement of prefrontal VIP neurons in reversal learning.前额叶血管活性肠肽能神经元在逆向学习中的选择性参与。
Sci Adv. 2025 Jul 25;11(30):eadt4945. doi: 10.1126/sciadv.adt4945. Epub 2025 Jul 23.
2
Feature identification learning both shapes and is shaped by spatial object-similarity representations.特征识别学习形状,同时也受空间对象相似性表征的影响。
Commun Psychol. 2025 May 13;3(1):77. doi: 10.1038/s44271-025-00259-w.
3
The reward positivity is insensitive to reinforcer devaluation.奖励积极对强化物贬值不敏感。
bioRxiv. 2025 Apr 29:2025.03.27.645774. doi: 10.1101/2025.03.27.645774.
4
Prefrontal meta-control incorporating mental simulation enhances the adaptivity of reinforcement learning agents in dynamic environments.结合心理模拟的前额叶元控制增强了强化学习智能体在动态环境中的适应性。
Front Comput Neurosci. 2025 Mar 27;19:1559915. doi: 10.3389/fncom.2025.1559915. eCollection 2025.
5
Behavioral microanalyses refine sign-tracking characterization and uncover different response dynamics during omission and extinction learning.行为微观分析细化了信号追踪特征,并揭示了在遗漏学习和消退学习过程中不同的反应动态。
Learn Mem. 2025 Mar 7;32(3). doi: 10.1101/lm.054065.124. Print 2025 Mar.
6
Devaluing memories of reward: a case for dopamine.贬低奖励记忆:多巴胺的一个实例
Commun Biol. 2025 Feb 3;8(1):161. doi: 10.1038/s42003-024-07440-7.
7
How childhood adversity affects components of decision making.童年逆境如何影响决策的各个组成部分。
Neurosci Biobehav Rev. 2025 Feb;169:106027. doi: 10.1016/j.neubiorev.2025.106027. Epub 2025 Jan 25.
8
Contextual cues facilitate dynamic value encoding in the mesolimbic dopamine system.情境线索有助于中脑边缘多巴胺系统中的动态价值编码。
Curr Biol. 2025 Feb 24;35(4):746-760.e5. doi: 10.1016/j.cub.2024.12.031. Epub 2025 Jan 23.
9
Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.奖励基础:一种用于适应性获取多种奖励类型的简单机制。
PLoS Comput Biol. 2024 Nov 19;20(11):e1012580. doi: 10.1371/journal.pcbi.1012580. eCollection 2024 Nov.
10
Proactive control for conflict resolution is intact in subclinical obsessive-compulsive individuals.在亚临床强迫症个体中,用于解决冲突的主动控制功能完好无损。
Front Psychol. 2024 Oct 22;15:1490147. doi: 10.3389/fpsyg.2024.1490147. eCollection 2024.

本文引用的文献

1
Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized.动作、动作序列和习惯:目标导向和习惯动作控制呈层级组织的证据。
PLoS Comput Biol. 2013;9(12):e1003364. doi: 10.1371/journal.pcbi.1003364. Epub 2013 Dec 5.
2
Operant self-stimulation of dopamine neurons in the substantia nigra.黑质多巴胺神经元的操作性自我刺激。
PLoS One. 2013 Jun 5;8(6):e65799. doi: 10.1371/journal.pone.0065799. Print 2013.
3
Hippocampal place-cell sequences depict future paths to remembered goals.海马体位置细胞序列描绘了通向记忆中目标的未来路径。
Nature. 2013 May 2;497(7447):74-9. doi: 10.1038/nature12112. Epub 2013 Apr 17.
4
Reward and aversion in a heterogeneous midbrain dopamine system.奖赏与厌恶在异质的中脑多巴胺系统中。
Neuropharmacology. 2014 Jan;76 Pt B(0 0):351-9. doi: 10.1016/j.neuropharm.2013.03.019. Epub 2013 Apr 8.
5
Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered 'wanting' for reward: entire core and medial shell mapped as substrates for PIT enhancement.伏隔核中多巴胺或阿片样物质刺激同样放大了线索引发的对奖励的“渴望”:整个核心和内侧壳被映射为 PIT 增强的底物。
Eur J Neurosci. 2013 May;37(9):1529-40. doi: 10.1111/ejn.12174. Epub 2013 Mar 17.
6
The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.混合仪器控制器:利用信息价值来结合习惯选择和心理模拟。
Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.
7
Instant transformation of learned repulsion into motivational "wanting".将习得性排斥即刻转化为动机性“渴望”。
Curr Biol. 2013 Feb 18;23(4):282-9. doi: 10.1016/j.cub.2013.01.016. Epub 2013 Jan 31.
8
Retrospective revaluation in sequential decision making: a tale of two systems.序贯决策中的回溯再评估:两个系统的故事。
J Exp Psychol Gen. 2014 Feb;143(1):182-94. doi: 10.1037/a0030844. Epub 2012 Dec 10.
9
The future of memory: remembering, imagining, and the brain.记忆的未来:记忆、想象与大脑。
Neuron. 2012 Nov 21;76(4):677-94. doi: 10.1016/j.neuron.2012.11.001.
10
Orbitofrontal cortex supports behavior and learning using inferred but not cached values.眶额皮层使用推断值而不是缓存值来支持行为和学习。
Science. 2012 Nov 16;338(6109):953-6. doi: 10.1126/science.1227489.