Guo Rong, Böhmer Wendelin, Hebart Martin, Chien Samson, Sommer Tobias, Obermayer Klaus, Gläscher Jan
Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin 10587, Germany,
Bernstein Center for Computational Neuroscience Berlin, Berlin 10115, Germany.
J Neurosci. 2016 Dec 14;36(50):12650-12660. doi: 10.1523/JNEUROSCI.1677-16.2016.
Goal-directed and instrumental learning are both important controllers of human behavior. Learning about which stimulus event occurs in the environment and the reward associated with them allows humans to seek out the most valuable stimulus and move through the environment in a goal-directed manner. Stimulus-response associations are characteristic of instrumental learning, whereas response-outcome associations are the hallmark of goal-directed learning. Here we provide behavioral, computational, and neuroimaging results from a novel task in which stimulus-response and response-outcome associations are learned simultaneously but dominate behavior at different stages of the experiment. We found that prediction error representations in the ventral striatum depend on which type of learning dominates. Furthermore, the amygdala tracks the time-dependent weighting of stimulus-response versus response-outcome learning. Our findings suggest that the goal-directed and instrumental controllers dynamically engage the ventral striatum in representing prediction errors whenever one of them is dominating choice behavior.
Converging evidence in human neuroimaging studies has shown that the reward prediction errors are correlated with activity in the ventral striatum. Our results demonstrate that this region is simultaneously correlated with a stimulus prediction error. Furthermore, the learning system that is currently dominating behavioral choice dynamically engages the ventral striatum for computing its prediction errors. This demonstrates that the prediction error representations are highly dynamic and influenced by various experimental context. This finding points to a general role of the ventral striatum in detecting expectancy violations and encoding error signals regardless of the specific nature of the reinforcer itself.
目标导向学习和工具性学习都是人类行为的重要控制机制。了解环境中发生的刺激事件以及与之相关的奖励,能让人类以目标导向的方式寻找最有价值的刺激并在环境中行动。刺激-反应关联是工具性学习的特征,而反应-结果关联则是目标导向学习的标志。在此,我们提供了一项新任务的行为、计算和神经成像结果,在该任务中,刺激-反应和反应-结果关联是同时学习的,但在实验的不同阶段主导行为。我们发现腹侧纹状体中的预测误差表征取决于哪种学习类型占主导。此外,杏仁核追踪刺激-反应学习与反应-结果学习随时间的权重变化。我们的研究结果表明,每当目标导向控制机制和工具性控制机制中的其中一个主导选择行为时,它们会动态地促使腹侧纹状体参与预测误差的表征。
人类神经成像研究中的汇聚证据表明,奖励预测误差与腹侧纹状体的活动相关。我们的结果表明,该区域同时与刺激预测误差相关。此外,当前主导行为选择的学习系统会动态地促使腹侧纹状体计算其预测误差。这表明预测误差表征具有高度动态性,并受各种实验背景的影响。这一发现指出了腹侧纹状体在检测预期违背和编码误差信号方面的一般作用,而与强化物本身的具体性质无关。