Suppr超能文献

人类纹状体中的信号适合用于策略更新,而不是价值预测。

Signals in human striatum are appropriate for policy update rather than value prediction.

机构信息

Department of Psychology and Center for Neural Science, New York University, New York, New York 10003, USA.

出版信息

J Neurosci. 2011 Apr 6;31(14):5504-11. doi: 10.1523/JNEUROSCI.6316-10.2011.

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

摘要

有影响力的强化学习理论提出,大脑黑质纹状体系统中的预测误差信号指导着试错决策的学习。然而,由于不同的决策变量可以从数量上相似的误差信号中学习到,因此一个关键问题是:误差信号训练的决策表示的内容是什么?我们使用 fMRI 监测了在双臂赌博反事实决策任务中的神经活动,该任务为人类受试者提供了关于错过和获得的货币结果的信息,以便区分更新每个动作的预期值的教学信号,与训练动作之间相对偏好(策略)的信号。两个选择的奖励概率彼此独立变化。这种特定的设计使我们能够测试受试者的选择行为是由基于策略的方法指导的,该方法直接将状态映射到有利的动作,还是由 Q-学习等基于价值的方法指导的,在该方法中,选择策略是通过学习中间表示(奖励期望)来生成的。行为上,我们发现人类参与者的选择受到前一次试验中获得的和错过的奖励的显著影响。我们还发现,被试者纹状体的血氧水平依赖反应受到经历过的和错过的奖励的相反调节,但不受奖励期望的调节。这种神经模式以及被试者的选择行为与用于发展习惯或相对动作偏好的教学信号一致,而不是用于更新单独动作值的预测误差。

相似文献

引用本文的文献

6
The roots of polarization in the individual reward system.个体奖励系统中极化的根源。
Proc Biol Sci. 2024 Feb 28;291(2017):20232011. doi: 10.1098/rspb.2023.2011.
8
The shadowing effect of initial expectation on learning asymmetry.初始期望对学习不对称性的遮蔽效应。
PLoS Comput Biol. 2023 Jul 24;19(7):e1010751. doi: 10.1371/journal.pcbi.1010751. eCollection 2023 Jul.
10
The role of memory in counterfactual valuation.记忆在反事实估值中的作用。
J Exp Psychol Gen. 2023 Jun;152(6):1754-1767. doi: 10.1037/xge0001364. Epub 2023 May 18.

本文引用的文献

3
How instructed knowledge modulates the neural systems of reward learning.指导知识如何调节奖励学习的神经系统。
Proc Natl Acad Sci U S A. 2011 Jan 4;108(1):55-60. doi: 10.1073/pnas.1014938108. Epub 2010 Dec 20.
7
A specific role for posterior dorsolateral striatum in human habit learning.后背外侧纹状体在人类习惯学习中的特定作用。
Eur J Neurosci. 2009 Jun;29(11):2225-32. doi: 10.1111/j.1460-9568.2009.06796.x. Epub 2009 May 21.
8
Decision theory, reinforcement learning, and the brain.决策理论、强化学习与大脑。
Cogn Affect Behav Neurosci. 2008 Dec;8(4):429-53. doi: 10.3758/CABN.8.4.429.
9
Associative learning of social value.社会价值的联想学习
Nature. 2008 Nov 13;456(7219):245-9. doi: 10.1038/nature07538.
10
Cortical mechanisms for reinforcement learning in competitive games.竞争性游戏中强化学习的皮层机制。
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 12;363(1511):3845-57. doi: 10.1098/rstb.2008.0158.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验