• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习的指令控制:一项行为与神经计算研究。

Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

作者信息

Doll Bradley B, Jacobs W Jake, Sanfey Alan G, Frank Michael J

机构信息

Department of Cognitive and Linguistic Sciences, Department of Psychology, Brown University, USA.

出版信息

Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.

DOI:10.1016/j.brainres.2009.07.007
PMID:19595993
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3050481/
Abstract

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

摘要

人类通过环境经验直接学习行为方式,也通过规则和指令间接学习。行为分析研究表明,指令能够控制行为,即便这种行为会导致次优结果(海斯,S.(编)。1989年。受规则支配的行为:认知、偶然性及指令控制。普林斯顿大学出版社。)。在此,我们在一项已知依赖纹状体多巴胺能功能的强化学习任务中,研究通过指令对行为的控制。参与者在概率性强化刺激之间进行选择,并(错误地)被告知某个特定刺激具有最高(或最低)强化概率。尽管有相反的经验,但指令仍驱动了选择行为。我们展示了神经网络模拟,该模拟通过两个潜在神经回路捕捉指令驱动行为与强化驱动行为之间的相互作用:一个回路中,纹状体由来自前额叶皮质/海马体(PFC/HC)的指令表征进行不准确训练;另一个回路中,纹状体学习基于环境的强化偶然性,但在决策输出时被“ override”。两种模型都捕捉到了核心行为现象,但由于它们在所学内容上存在根本差异,因此对后续行为和神经成像实验做出了不同预测。最后,我们试图通过将一系列抽象的“Q学习”和贝叶斯模型拟合到受试者数据,来区分所提出的控制指令行为的计算机制。拟合效果最佳的模型支持其中一种神经模型,这表明存在一种“确认偏差”,即PFC/HC系统通过放大与指令一致的结果同时减少不一致结果来训练强化系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/6ed726eb2912/nihms-274334-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/4724816370a8/nihms-274334-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/3c4535c9464f/nihms-274334-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/d84bf08da113/nihms-274334-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/300a3da0d34b/nihms-274334-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/e35a17f33d12/nihms-274334-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/5dcd57f0d42d/nihms-274334-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/e84959ae5437/nihms-274334-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/357338276524/nihms-274334-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/8f1b6dadb6e1/nihms-274334-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/6ed726eb2912/nihms-274334-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/4724816370a8/nihms-274334-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/3c4535c9464f/nihms-274334-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/d84bf08da113/nihms-274334-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/300a3da0d34b/nihms-274334-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/e35a17f33d12/nihms-274334-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/5dcd57f0d42d/nihms-274334-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/e84959ae5437/nihms-274334-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/357338276524/nihms-274334-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/8f1b6dadb6e1/nihms-274334-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed1/3050481/6ed726eb2912/nihms-274334-f0010.jpg

相似文献

1
Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.强化学习的指令控制:一项行为与神经计算研究。
Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.
2
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
3
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
4
Navigating complex decision spaces: Problems and paradigms in sequential choice.导航复杂决策空间:序列选择中的问题和范式。
Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.
5
Generalization of value in reinforcement learning by humans.人类在强化学习中的价值泛化。
Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.
6
On the Role of Cortex-Basal Ganglia Interactions for Category Learning: A Neurocomputational Approach.关于皮质基底节相互作用在类别学习中的作用:一种神经计算方法。
J Neurosci. 2018 Oct 31;38(44):9551-9562. doi: 10.1523/JNEUROSCI.0874-18.2018. Epub 2018 Sep 18.
7
Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.对前额叶皮层多巴胺能投射的强直或相位刺激使小鼠维持或偏离先前习得的行为策略。
J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.
8
Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices.纹状体和腹内侧前额叶皮层中的多巴胺介导的强化学习信号是基于价值的选择的基础。
J Neurosci. 2011 Feb 2;31(5):1606-13. doi: 10.1523/JNEUROSCI.3904-10.2011.
9
Computational perspectives on forebrain microcircuits implicated in reinforcement learning, action selection, and cognitive control.关于参与强化学习、动作选择和认知控制的前脑微回路的计算观点。
Neural Netw. 2009 Jul-Aug;22(5-6):757-65. doi: 10.1016/j.neunet.2009.06.008. Epub 2009 Jun 30.
10
Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.皮质纹状体电路中分层强化学习的机制 1:计算分析。
Cereb Cortex. 2012 Mar;22(3):509-26. doi: 10.1093/cercor/bhr114. Epub 2011 Jun 21.

引用本文的文献

1
How working memory and reinforcement learning interact when avoiding punishment and pursuing reward concurrently.当同时避免惩罚和追求奖励时,工作记忆与强化学习是如何相互作用的。
J Exp Psychol Gen. 2025 Sep 1. doi: 10.1037/xge0001817.
2
Behavioral phenotyping identifies autism-like repetitive stereotypies in a Tsc2 haploinsufficient rat model.行为表型分析在Tsc2单倍体不足大鼠模型中鉴定出自闭症样重复刻板行为。
Behav Brain Funct. 2025 Jul 3;21(1):20. doi: 10.1186/s12993-025-00284-z.
3
Transmission of societal stereotypes to individual-level prejudice through instrumental learning.

本文引用的文献

1
Computational models for the combination of advice and individual learning.用于建议和个体学习相结合的计算模型。
Cogn Sci. 2009 Mar;33(2):206-42. doi: 10.1111/j.1551-6709.2009.01010.x.
2
Bayesian approaches to associative learning: from passive to active learning.贝叶斯关联学习方法:从被动学习到主动学习。
Learn Behav. 2008 Aug;36(3):210-26. doi: 10.3758/lb.36.3.210.
3
Genetically determined differences in learning from errors.基因决定的从错误中学习的差异。
通过工具性学习将社会刻板印象传递到个体层面的偏见。
Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2414518121. doi: 10.1073/pnas.2414518121. Epub 2024 Nov 1.
4
Comparing experience- and description-based economic preferences across 11 countries.比较 11 个国家基于经验和描述的经济偏好。
Nat Hum Behav. 2024 Aug;8(8):1554-1567. doi: 10.1038/s41562-024-01894-9. Epub 2024 Jun 14.
5
The challenge of learning adaptive mental behavior.学习自适应心理行为的挑战。
J Psychopathol Clin Sci. 2024 Jul;133(5):413-426. doi: 10.1037/abn0000924. Epub 2024 May 30.
6
Disentangling the contribution of individual and social learning processes in human advice-taking behavior.厘清个体学习过程和社会学习过程在人类听从建议行为中的作用。
NPJ Sci Learn. 2024 Jan 20;9(1):4. doi: 10.1038/s41539-024-00214-0.
7
Prefrontal signals precede striatal signals for biased credit assignment in motivational learning biases.在动机学习偏差中,前额叶信号先于纹状体信号,以便进行有偏差的信用分配。
Nat Commun. 2024 Jan 2;15(1):19. doi: 10.1038/s41467-023-44632-x.
8
The shadowing effect of initial expectation on learning asymmetry.初始期望对学习不对称性的遮蔽效应。
PLoS Comput Biol. 2023 Jul 24;19(7):e1010751. doi: 10.1371/journal.pcbi.1010751. eCollection 2023 Jul.
9
Human ventromedial prefrontal cortex lesions enhance the effect of expectations on pain perception.人类腹内侧前额叶皮层损伤增强了预期对疼痛感知的影响。
Cortex. 2023 Sep;166:188-206. doi: 10.1016/j.cortex.2023.04.017. Epub 2023 Jun 9.
10
Explaining the description-experience gap in risky decision-making: learning and memory retention during experience as causal mechanisms.解释风险决策中的描述-体验差距:作为因果机制的经验期间的学习和记忆保留。
Cogn Affect Behav Neurosci. 2023 Jun;23(3):557-577. doi: 10.3758/s13415-023-01099-z. Epub 2023 Jun 8.
Science. 2007 Dec 7;318(5856):1642-5. doi: 10.1126/science.1145044.
4
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中,人类纹状体中的强化学习信号可区分学习者和非学习者。
J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.
5
Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism.别急:帕金森病中的冲动性、深部脑刺激与药物治疗
Science. 2007 Nov 23;318(5854):1309-12. doi: 10.1126/science.1146157. Epub 2007 Oct 25.
6
Dopamine modulation of hippocampal-prefrontal cortical interaction drives memory-guided behavior.多巴胺对海马体-前额叶皮质相互作用的调节驱动记忆引导行为。
Cereb Cortex. 2008 Jun;18(6):1407-14. doi: 10.1093/cercor/bhm172. Epub 2007 Oct 12.
7
Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning.基因三重解离揭示了多巴胺在强化学习中的多种作用。
Proc Natl Acad Sci U S A. 2007 Oct 9;104(41):16311-6. doi: 10.1073/pnas.0706111104. Epub 2007 Oct 3.
8
Multiple dopamine functions at different time courses.多巴胺在不同时间进程中具有多种功能。
Annu Rev Neurosci. 2007;30:259-88. doi: 10.1146/annurev.neuro.28.061604.135722.
9
Neural signature of fictive learning signals in a sequential investment task.序列投资任务中虚构学习信号的神经特征
Proc Natl Acad Sci U S A. 2007 May 29;104(22):9493-8. doi: 10.1073/pnas.0608842104. Epub 2007 May 22.
10
Model-based fMRI and its application to reward learning and decision making.基于模型的功能磁共振成像及其在奖励学习与决策中的应用。
Ann N Y Acad Sci. 2007 May;1104:35-53. doi: 10.1196/annals.1390.022. Epub 2007 Apr 7.