Doll Bradley B, Jacobs W Jake, Sanfey Alan G, Frank Michael J
Department of Cognitive and Linguistic Sciences, Department of Psychology, Brown University, USA.
Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.
人类通过环境经验直接学习行为方式,也通过规则和指令间接学习。行为分析研究表明,指令能够控制行为,即便这种行为会导致次优结果(海斯,S.(编)。1989年。受规则支配的行为:认知、偶然性及指令控制。普林斯顿大学出版社。)。在此,我们在一项已知依赖纹状体多巴胺能功能的强化学习任务中,研究通过指令对行为的控制。参与者在概率性强化刺激之间进行选择,并(错误地)被告知某个特定刺激具有最高(或最低)强化概率。尽管有相反的经验,但指令仍驱动了选择行为。我们展示了神经网络模拟,该模拟通过两个潜在神经回路捕捉指令驱动行为与强化驱动行为之间的相互作用:一个回路中,纹状体由来自前额叶皮质/海马体(PFC/HC)的指令表征进行不准确训练;另一个回路中,纹状体学习基于环境的强化偶然性,但在决策输出时被“ override”。两种模型都捕捉到了核心行为现象,但由于它们在所学内容上存在根本差异,因此对后续行为和神经成像实验做出了不同预测。最后,我们试图通过将一系列抽象的“Q学习”和贝叶斯模型拟合到受试者数据,来区分所提出的控制指令行为的计算机制。拟合效果最佳的模型支持其中一种神经模型,这表明存在一种“确认偏差”,即PFC/HC系统通过放大与指令一致的结果同时减少不一致结果来训练强化系统。