Suppr超能文献

状态与奖励:基于模型和无模型强化学习的分离神经预测误差信号。

States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

机构信息

Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91101, USA.

出版信息

Neuron. 2010 May 27;66(4):585-95. doi: 10.1016/j.neuron.2010.04.016.

Abstract

Reinforcement learning (RL) uses sequential experience with situations ("states") and outcomes to assess actions. Whereas model-free RL uses this experience directly, in the form of a reward prediction error (RPE), model-based RL uses it indirectly, building a model of the state transition and outcome structure of the environment, and evaluating actions by searching this model. A state prediction error (SPE) plays a central role, reporting discrepancies between the current model and the observed state transitions. Using functional magnetic resonance imaging in humans solving a probabilistic Markov decision task, we found the neural signature of an SPE in the intraparietal sulcus and lateral prefrontal cortex, in addition to the previously well-characterized RPE in the ventral striatum. This finding supports the existence of two unique forms of learning signal in humans, which may form the basis of distinct computational strategies for guiding behavior.

摘要

强化学习 (RL) 使用与情况(“状态”)和结果相关的顺序经验来评估行动。虽然无模型 RL 直接使用这种经验,形式为奖励预测误差 (RPE),但基于模型的 RL 则间接地使用它,构建环境的状态转换和结果结构模型,并通过搜索该模型来评估行动。状态预测误差 (SPE) 起着核心作用,报告当前模型与观察到的状态转换之间的差异。使用人类解决概率马尔可夫决策任务的功能磁共振成像,我们在顶内沟和外侧前额叶皮层中发现了 SPE 的神经特征,除了先前在腹侧纹状体中很好地描述的 RPE 之外。这一发现支持了人类存在两种独特形式的学习信号的假设,这可能是指导行为的不同计算策略的基础。

相似文献

3
Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals.猴子纹状体中的β振荡编码奖励预测误差信号。
J Neurosci. 2023 May 3;43(18):3339-3352. doi: 10.1523/JNEUROSCI.0952-22.2023. Epub 2023 Apr 4.
7
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.

引用本文的文献

10
Transition ability to safe states reduces fear responses to height.向安全状态的转换能力可降低对高度的恐惧反应。
Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2416920122. doi: 10.1073/pnas.2416920122. Epub 2025 May 13.

本文引用的文献

2
Visualization of group inference data in functional neuroimaging.功能神经成像中群体推断数据的可视化。
Neuroinformatics. 2009 Spring;7(1):73-82. doi: 10.1007/s12021-008-9042-x. Epub 2009 Jan 13.
3
Cognitive maps in rats and men.大鼠和人类的认知地图。
Psychol Rev. 1948 Jul;55(4):189-208. doi: 10.1037/h0061626.
4
Regulating the expectation of reward via cognitive strategies.通过认知策略调节对奖励的期望。
Nat Neurosci. 2008 Aug;11(8):880-1. doi: 10.1038/nn.2141. Epub 2008 Jun 29.
10
A PROOF OF THE LAW OF EFFECT.效果律的一个证明。
Science. 1933 Feb 10;77(1989):173-5. doi: 10.1126/science.77.1989.173-a.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验