人类纹状体中的信号适合用于策略更新，而不是价值预测。

Signals in human striatum are appropriate for policy update rather than value prediction.

机构信息

Department of Psychology and Center for Neural Science, New York University, New York, New York 10003, USA.

出版信息

J Neurosci. 2011 Apr 6;31(14):5504-11. doi: 10.1523/JNEUROSCI.6316-10.2011.

DOI:10.1523/JNEUROSCI.6316-10.2011

PMID:21471387

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3132551/

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

摘要

有影响力的强化学习理论提出，大脑黑质纹状体系统中的预测误差信号指导着试错决策的学习。然而，由于不同的决策变量可以从数量上相似的误差信号中学习到，因此一个关键问题是：误差信号训练的决策表示的内容是什么？我们使用 fMRI 监测了在双臂赌博反事实决策任务中的神经活动，该任务为人类受试者提供了关于错过和获得的货币结果的信息，以便区分更新每个动作的预期值的教学信号，与训练动作之间相对偏好（策略）的信号。两个选择的奖励概率彼此独立变化。这种特定的设计使我们能够测试受试者的选择行为是由基于策略的方法指导的，该方法直接将状态映射到有利的动作，还是由 Q-学习等基于价值的方法指导的，在该方法中，选择策略是通过学习中间表示（奖励期望）来生成的。行为上，我们发现人类参与者的选择受到前一次试验中获得的和错过的奖励的显著影响。我们还发现，被试者纹状体的血氧水平依赖反应受到经历过的和错过的奖励的相反调节，但不受奖励期望的调节。这种神经模式以及被试者的选择行为与用于发展习惯或相对动作偏好的教学信号一致，而不是用于更新单独动作值的预测误差。

相似文献

Signals in human striatum are appropriate for policy update rather than value prediction.人类纹状体中的信号适合用于策略更新，而不是价值预测。

J Neurosci. 2011 Apr 6;31(14):5504-11. doi: 10.1523/JNEUROSCI.6316-10.2011.

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.用于整合多个皮质-纹状体环路的异层级强化学习模型：刺激-动作-奖励关联学习中的功能磁共振成像检查

Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中，人类纹状体中的强化学习信号可区分学习者和非学习者。

J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.

Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.人类大脑在使用果汁和金钱奖励进行工具性学习过程中，背侧纹状体的预测误差存在重叠。

J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30.

Neural correlates of forward planning in a spatial decision task in humans.人类在空间决策任务中进行前瞻性规划的神经关联。

J Neurosci. 2011 Apr 6;31(14):5526-39. doi: 10.1523/JNEUROSCI.4647-10.2011.

The contribution of striatal pseudo-reward prediction errors to value-based decision-making.纹状体假性奖赏预测误差对基于价值的决策的贡献。

Neuroimage. 2019 Jun;193:67-74. doi: 10.1016/j.neuroimage.2019.02.052. Epub 2019 Mar 7.

Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices.纹状体和腹内侧前额叶皮层中的多巴胺介导的强化学习信号是基于价值的选择的基础。

J Neurosci. 2011 Feb 2;31(5):1606-13. doi: 10.1523/JNEUROSCI.3904-10.2011.

Dorsal striatal-midbrain connectivity in humans predicts how reinforcements are used to guide decisions.人类背侧纹状体与中脑的连接性可预测强化如何用于指导决策。

J Cogn Neurosci. 2009 Jul;21(7):1332-45. doi: 10.1162/jocn.2009.21092.

Neural regions that underlie reinforcement learning are also active for social expectancy violations.强化学习所依赖的神经区域对于社会期望违反也很活跃。

Soc Neurosci. 2010;5(1):76-91. doi: 10.1080/17470910903135825.

Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures.在一项决策任务中，预测错误的神经特征受动作执行失败的调节。

Curr Biol. 2019 May 20;29(10):1606-1613.e5. doi: 10.1016/j.cub.2019.04.011. Epub 2019 May 2.

引用本文的文献

Estimation-uncertainty affects decisions with and without learning opportunities.估计不确定性会影响有无学习机会情况下的决策。

Nat Commun. 2025 Jul 21;16(1):6706. doi: 10.1038/s41467-025-61960-2.

Assessing social anhedonia in a transdiagnostic sample: Insights from a computational psychiatry lens.在跨诊断样本中评估社交快感缺失：来自计算精神病学视角的见解。

J Mood Anxiety Disord. 2024 Sep 17;8:100088. doi: 10.1016/j.xjmad.2024.100088. eCollection 2024 Dec.

Cross-species translational paradigms for assessing positive valence system as defined by the RDoC matrix.用于评估由研究领域标准矩阵（RDoC）定义的正性效价系统的跨物种转化范式。

J Neurochem. 2025 Jan;169(1):e16243. doi: 10.1111/jnc.16243. Epub 2024 Oct 28.

Greater ventral striatal functional connectivity in cigarette smokers relative to non-smokers across a spectrum of alcohol consumption.与不吸烟者相比，在酒精摄入的各个阶段，吸烟者的腹侧纹状体的功能连接更强。

Brain Imaging Behav. 2024 Oct;18(5):1121-1130. doi: 10.1007/s11682-024-00903-9. Epub 2024 Aug 6.

Distinct Action Signals by Subregions in the Nucleus Accumbens during STOP-Change Performance.伏隔核亚区域在停止-转换任务执行过程中的不同动作信号

J Neurosci. 2024 Jul 17;44(29):e0020242024. doi: 10.1523/JNEUROSCI.0020-24.2024.

The roots of polarization in the individual reward system.个体奖励系统中极化的根源。

Proc Biol Sci. 2024 Feb 28;291(2017):20232011. doi: 10.1098/rspb.2023.2011.

Computational mechanisms underlying latent value updating of unchosen actions.潜在未选动作价值更新的计算机制。

Sci Adv. 2023 Oct 20;9(42):eadi2704. doi: 10.1126/sciadv.adi2704.

The shadowing effect of initial expectation on learning asymmetry.初始期望对学习不对称性的遮蔽效应。

PLoS Comput Biol. 2023 Jul 24;19(7):e1010751. doi: 10.1371/journal.pcbi.1010751. eCollection 2023 Jul.

The functional form of value normalization in human reinforcement learning.人类强化学习中的价值归一化的函数形式。

Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.

The role of memory in counterfactual valuation.记忆在反事实估值中的作用。

J Exp Psychol Gen. 2023 Jun;152(6):1754-1767. doi: 10.1037/xge0001364. Epub 2023 May 18.

本文引用的文献

Neural correlates of forward planning in a spatial decision task in humans.人类在空间决策任务中进行前瞻性规划的神经关联。

J Neurosci. 2011 Apr 6;31(14):5526-39. doi: 10.1523/JNEUROSCI.4647-10.2011.

Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

How instructed knowledge modulates the neural systems of reward learning.指导知识如何调节奖励学习的神经系统。

Proc Natl Acad Sci U S A. 2011 Jan 4;108(1):55-60. doi: 10.1073/pnas.1014938108. Epub 2010 Dec 20.

Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study.帕金森病患者背外侧纹状体而非腹侧纹状体中预测误差信号的选择性损伤：来自基于模型的 fMRI 研究的证据。

Neuroimage. 2010 Jan 1;49(1):772-81. doi: 10.1016/j.neuroimage.2009.08.011. Epub 2009 Aug 12.

Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.强化学习的指令控制：一项行为与神经计算研究。

Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.

How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action.对岸的草有多绿？额极皮质与支持其他行动方案的证据。

Neuron. 2009 Jun 11;62(5):733-43. doi: 10.1016/j.neuron.2009.05.014.

A specific role for posterior dorsolateral striatum in human habit learning.后背外侧纹状体在人类习惯学习中的特定作用。

Eur J Neurosci. 2009 Jun;29(11):2225-32. doi: 10.1111/j.1460-9568.2009.06796.x. Epub 2009 May 21.

Decision theory, reinforcement learning, and the brain.决策理论、强化学习与大脑。

Cogn Affect Behav Neurosci. 2008 Dec;8(4):429-53. doi: 10.3758/CABN.8.4.429.

Associative learning of social value.社会价值的联想学习

Nature. 2008 Nov 13;456(7219):245-9. doi: 10.1038/nature07538.

Cortical mechanisms for reinforcement learning in competitive games.竞争性游戏中强化学习的皮层机制。

Philos Trans R Soc Lond B Biol Sci. 2008 Dec 12;363(1511):3845-57. doi: 10.1098/rstb.2008.0158.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验