Center for Cognitive Neuroscience, Duke University, Durham, NC, United States.
Department of Experimental Psychology, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands.
Soc Cogn Affect Neurosci. 2019 Feb 13;14(2):173-187. doi: 10.1093/scan/nsy116.
Successful adaptive behavior requires the learning of associations between stimulus-specific choices and rewarding outcomes. Most research on the mechanisms underlying such processes has focused on subcortical reward-processing regions, in conjunction with frontal circuits. Given the extensive stimulus-specific coding in the sensory cortices, we hypothesized they would play a key role in the learning of stimulus-specific reward associations. We recorded electrical brain activity (using electroencephalogram) during a learning-based decision-making gambling task where, on each trial, participants chose between a face and a house and then received feedback (gain or loss). Within each 20-trial set, either faces or houses were more likely to predict a gain. Results showed that early feedback processing (200-1200 ms) was independent of the choice made. In contrast, later feedback processing (1400-1800 ms) was stimulus-specific, reflected by decreased alpha power (reflecting increased cortical activity) over face-selective regions, for winning-vs-losing after a face choice but not after a house choice. Finally, as the reward association was learned in a set, there was an increasingly stronger attentional bias towards the more likely winning stimulus, reflected by increasing attentional orienting-related brain activity and increasing likelihood of choosing that stimulus. These results delineate the processes underlying the updating of stimulus-reward associations during feedback-guided learning, which then guide future attentional allocation and decision-making.
成功的适应性行为需要学习刺激特异性选择与奖励结果之间的关联。大多数关于这些过程背后的机制的研究都集中在皮质下奖励处理区域,以及与额叶回路的结合。鉴于感觉皮层中广泛的刺激特异性编码,我们假设它们将在学习刺激特异性奖励关联中发挥关键作用。我们在基于学习的决策赌博任务中记录了电脑活动(使用脑电图),在每次试验中,参与者在面孔和房屋之间进行选择,然后收到反馈(收益或损失)。在每 20 次试验集内,面孔或房屋更有可能预测收益。结果表明,早期反馈处理(200-1200ms)与所做的选择无关。相比之下,后期反馈处理(1400-1800ms)是刺激特异性的,反映在面孔选择后,获胜与输的情况下,面孔选择性区域的 alpha 功率降低(反映皮质活动增加),而在房屋选择后则没有。最后,当在一组中学习奖励关联时,对更有可能获胜的刺激的注意力偏向越来越强,这反映在与注意力定向相关的脑活动增加和选择该刺激的可能性增加。这些结果描绘了在反馈引导学习过程中更新刺激-奖励关联的过程,这些过程随后指导未来的注意力分配和决策制定。