Suppr超能文献

强化学习的指令控制:一项行为与神经计算研究。

Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

作者信息

Doll Bradley B, Jacobs W Jake, Sanfey Alan G, Frank Michael J

机构信息

Department of Cognitive and Linguistic Sciences, Department of Psychology, Brown University, USA.

出版信息

Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.

Abstract

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

摘要

人类通过环境经验直接学习行为方式,也通过规则和指令间接学习。行为分析研究表明,指令能够控制行为,即便这种行为会导致次优结果(海斯,S.(编)。1989年。受规则支配的行为:认知、偶然性及指令控制。普林斯顿大学出版社。)。在此,我们在一项已知依赖纹状体多巴胺能功能的强化学习任务中,研究通过指令对行为的控制。参与者在概率性强化刺激之间进行选择,并(错误地)被告知某个特定刺激具有最高(或最低)强化概率。尽管有相反的经验,但指令仍驱动了选择行为。我们展示了神经网络模拟,该模拟通过两个潜在神经回路捕捉指令驱动行为与强化驱动行为之间的相互作用:一个回路中,纹状体由来自前额叶皮质/海马体(PFC/HC)的指令表征进行不准确训练;另一个回路中,纹状体学习基于环境的强化偶然性,但在决策输出时被“ override”。两种模型都捕捉到了核心行为现象,但由于它们在所学内容上存在根本差异,因此对后续行为和神经成像实验做出了不同预测。最后,我们试图通过将一系列抽象的“Q学习”和贝叶斯模型拟合到受试者数据,来区分所提出的控制指令行为的计算机制。拟合效果最佳的模型支持其中一种神经模型,这表明存在一种“确认偏差”,即PFC/HC系统通过放大与指令一致的结果同时减少不一致结果来训练强化系统。

相似文献

5
Generalization of value in reinforcement learning by humans.人类在强化学习中的价值泛化。
Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.

引用本文的文献

5
The challenge of learning adaptive mental behavior.学习自适应心理行为的挑战。
J Psychopathol Clin Sci. 2024 Jul;133(5):413-426. doi: 10.1037/abn0000924. Epub 2024 May 30.
8
The shadowing effect of initial expectation on learning asymmetry.初始期望对学习不对称性的遮蔽效应。
PLoS Comput Biol. 2023 Jul 24;19(7):e1010751. doi: 10.1371/journal.pcbi.1010751. eCollection 2023 Jul.

本文引用的文献

8
Multiple dopamine functions at different time courses.多巴胺在不同时间进程中具有多种功能。
Annu Rev Neurosci. 2007;30:259-88. doi: 10.1146/annurev.neuro.28.061604.135722.
9
Neural signature of fictive learning signals in a sequential investment task.序列投资任务中虚构学习信号的神经特征
Proc Natl Acad Sci U S A. 2007 May 29;104(22):9493-8. doi: 10.1073/pnas.0608842104. Epub 2007 May 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验