• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

状态与奖励:基于模型和无模型强化学习的分离神经预测误差信号。

States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

机构信息

Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91101, USA.

出版信息

Neuron. 2010 May 27;66(4):585-95. doi: 10.1016/j.neuron.2010.04.016.

DOI:10.1016/j.neuron.2010.04.016
PMID:20510862
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2895323/
Abstract

Reinforcement learning (RL) uses sequential experience with situations ("states") and outcomes to assess actions. Whereas model-free RL uses this experience directly, in the form of a reward prediction error (RPE), model-based RL uses it indirectly, building a model of the state transition and outcome structure of the environment, and evaluating actions by searching this model. A state prediction error (SPE) plays a central role, reporting discrepancies between the current model and the observed state transitions. Using functional magnetic resonance imaging in humans solving a probabilistic Markov decision task, we found the neural signature of an SPE in the intraparietal sulcus and lateral prefrontal cortex, in addition to the previously well-characterized RPE in the ventral striatum. This finding supports the existence of two unique forms of learning signal in humans, which may form the basis of distinct computational strategies for guiding behavior.

摘要

强化学习 (RL) 使用与情况(“状态”)和结果相关的顺序经验来评估行动。虽然无模型 RL 直接使用这种经验,形式为奖励预测误差 (RPE),但基于模型的 RL 则间接地使用它,构建环境的状态转换和结果结构模型,并通过搜索该模型来评估行动。状态预测误差 (SPE) 起着核心作用,报告当前模型与观察到的状态转换之间的差异。使用人类解决概率马尔可夫决策任务的功能磁共振成像,我们在顶内沟和外侧前额叶皮层中发现了 SPE 的神经特征,除了先前在腹侧纹状体中很好地描述的 RPE 之外。这一发现支持了人类存在两种独特形式的学习信号的假设,这可能是指导行为的不同计算策略的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/aa1743437012/nihms-199499-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/4415b0908e6d/nihms-199499-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/b85d8f889123/nihms-199499-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/52046da7cd48/nihms-199499-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/ea2481e17442/nihms-199499-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/aa1743437012/nihms-199499-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/4415b0908e6d/nihms-199499-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/b85d8f889123/nihms-199499-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/52046da7cd48/nihms-199499-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/ea2481e17442/nihms-199499-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d5/2895323/aa1743437012/nihms-199499-f0005.jpg

相似文献

1
States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.状态与奖励:基于模型和无模型强化学习的分离神经预测误差信号。
Neuron. 2010 May 27;66(4):585-95. doi: 10.1016/j.neuron.2010.04.016.
2
The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.在无选择情况下观察性奖励学习过程中基于模型而非无模型学习信号的参与。
J Neurophysiol. 2016 Jun 1;115(6):3195-203. doi: 10.1152/jn.00046.2016. Epub 2016 Apr 6.
3
Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals.猴子纹状体中的β振荡编码奖励预测误差信号。
J Neurosci. 2023 May 3;43(18):3339-3352. doi: 10.1523/JNEUROSCI.0952-22.2023. Epub 2023 Apr 4.
4
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中,人类纹状体中的强化学习信号可区分学习者和非学习者。
J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.
5
Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.腹侧纹状体损伤对基于刺激与基于动作的强化学习的影响。
J Neurosci. 2017 Jul 19;37(29):6902-6914. doi: 10.1523/JNEUROSCI.0631-17.2017. Epub 2017 Jun 16.
6
A Neurocomputational Account of How Inflammation Enhances Sensitivity to Punishments Versus Rewards.关于炎症如何增强对惩罚与奖励敏感性的神经计算解释。
Biol Psychiatry. 2016 Jul 1;80(1):73-81. doi: 10.1016/j.biopsych.2015.07.018. Epub 2015 Aug 1.
7
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
8
One-shot learning and behavioral eligibility traces in sequential decision making.序列决策中的单次学习和行为资格痕迹。
Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.
9
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策:强化学习预测错误在人类中的快速传播。
J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.
10
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

引用本文的文献

1
Neural correlates of reduced sensitivity to information about uncertainty during valuation in older adults: An fNIRS study.老年人估值过程中对不确定性信息敏感度降低的神经关联:一项功能近红外光谱研究。
Imaging Neurosci (Camb). 2025 Jun 24;3. doi: 10.1162/IMAG.a.61. eCollection 2025.
2
Non-invasive Ultrasonic Neuromodulation of the Human Nucleus Accumbens Impacts Reward Sensitivity.对人类伏隔核进行非侵入性超声神经调节会影响奖赏敏感性。
bioRxiv. 2025 Aug 6:2024.07.25.605068. doi: 10.1101/2024.07.25.605068.
3
Reconciling flexibility and efficiency: medial entorhinal cortex represents a compositional cognitive map.

本文引用的文献

1
Human reinforcement learning subdivides structured action spaces by learning effector-specific values.人类强化学习通过学习特定效应器的值来细分结构化动作空间。
J Neurosci. 2009 Oct 28;29(43):13524-31. doi: 10.1523/JNEUROSCI.2469-09.2009.
2
Visualization of group inference data in functional neuroimaging.功能神经成像中群体推断数据的可视化。
Neuroinformatics. 2009 Spring;7(1):73-82. doi: 10.1007/s12021-008-9042-x. Epub 2009 Jan 13.
3
Cognitive maps in rats and men.大鼠和人类的认知地图。
兼顾灵活性与效率:内嗅皮质内侧表征一种组合认知地图。
Nat Commun. 2025 Aug 12;16(1):7444. doi: 10.1038/s41467-025-62733-7.
4
Higher-order and distributed synergistic functional interactions encode information gain in goal-directed learning.高阶和分布式协同功能相互作用在目标导向学习中编码信息增益。
Nat Commun. 2025 Aug 5;16(1):7179. doi: 10.1038/s41467-025-62507-1.
5
Modelling cognitive flexibility with deep neural networks.使用深度神经网络对认知灵活性进行建模。
Curr Opin Behav Sci. 2024 Jun;57:101361. doi: 10.1016/j.cobeha.2024.101361.
6
Cognitive computational model reveals repetition bias in a sequential decision-making task.认知计算模型揭示了序列决策任务中的重复偏差。
Commun Psychol. 2025 Jun 13;3(1):92. doi: 10.1038/s44271-025-00271-0.
7
The Cerebellum and Striatum in Reward Processing: Caring About Being Right vs. Caring About Reward.奖赏处理中的小脑与纹状体:关注正确与否与关注奖赏
bioRxiv. 2025 Jun 5:2025.06.04.657253. doi: 10.1101/2025.06.04.657253.
8
Higher motivation and pleasure scores predict more reliance on model-free decision making.更高的动机和愉悦得分预示着对无模型决策的更多依赖。
Cogn Affect Behav Neurosci. 2025 May 22. doi: 10.3758/s13415-025-01302-3.
9
Computational modelling and neural correlates of reinforcement learning following three-week escitalopram: a double-blind, placebo-controlled semi-randomised study.三周艾司西酞普兰治疗后强化学习的计算模型与神经关联:一项双盲、安慰剂对照半随机研究
Transl Psychiatry. 2025 May 21;15(1):175. doi: 10.1038/s41398-025-03392-6.
10
Transition ability to safe states reduces fear responses to height.向安全状态的转换能力可降低对高度的恐惧反应。
Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2416920122. doi: 10.1073/pnas.2416920122. Epub 2025 May 13.
Psychol Rev. 1948 Jul;55(4):189-208. doi: 10.1037/h0061626.
4
Regulating the expectation of reward via cognitive strategies.通过认知策略调节对奖励的期望。
Nat Neurosci. 2008 Aug;11(8):880-1. doi: 10.1038/nn.2141. Epub 2008 Jun 29.
5
Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making.确定腹内侧前额叶皮层在奖励相关决策过程中编码基于动作的价值信号方面的作用。
Cereb Cortex. 2009 Feb;19(2):483-95. doi: 10.1093/cercor/bhn098. Epub 2008 Jun 11.
6
Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical area.感觉运动皮层区域而非感觉皮层区域中知觉学习的神经关联。
Nat Neurosci. 2008 Apr;11(4):505-13. doi: 10.1038/nn2070. Epub 2008 Mar 9.
7
BOLD responses reflecting dopaminergic signals in the human ventral tegmental area.反映人类腹侧被盖区多巴胺能信号的血氧水平依赖性功能磁共振成像响应。
Science. 2008 Feb 29;319(5867):1264-7. doi: 10.1126/science.1150605.
8
Novelty and target processing during an auditory novelty oddball: a simultaneous event-related potential and functional magnetic resonance imaging study.听觉新奇偏差任务中的新奇性与目标加工:一项事件相关电位与功能磁共振成像同步研究
Neuroimage. 2008 Apr 1;40(2):869-883. doi: 10.1016/j.neuroimage.2007.10.065. Epub 2007 Dec 15.
9
Posterior parietal cortex encodes autonomously selected motor plans.顶叶后皮质对自主选择的运动计划进行编码。
Neuron. 2007 Nov 8;56(3):552-9. doi: 10.1016/j.neuron.2007.09.031.
10
A PROOF OF THE LAW OF EFFECT.效果律的一个证明。
Science. 1933 Feb 10;77(1989):173-5. doi: 10.1126/science.77.1989.173-a.