Collins Anne G E, Ciullo Brittany, Frank Michael J, Badre David
Department of Psychology and
Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720, and.
J Neurosci. 2017 Apr 19;37(16):4332-4342. doi: 10.1523/JNEUROSCI.2700-16.2017. Epub 2017 Mar 20.
Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors.
在简单的工具性任务中,强化学习(RL)通常被建模为一个整体过程,其中奖励预测误差(RPE)用于更新选择选项的期望值。这种建模忽略了不同记忆和决策系统的不同贡献,而这些系统被认为即使在简单学习中也发挥作用。在一项功能磁共振成像(fMRI)实验中,我们研究了工作记忆(WM)和增量RL过程如何相互作用以指导人类学习。通过在不同组块中改变要学习的刺激数量来操纵WM负荷。行为结果和计算建模证实,学习最好解释为两种机制的混合:一种快速、容量有限且对延迟敏感的WM过程以及较慢的RL。基于模型的fMRI数据分析表明,纹状体和外侧前额叶皮层对RPE敏感,如先前所示,但关键的是,当学习问题在WM容量范围内时,这些信号会减弱。这种神经相互作用的程度与使用WM指导行为学习的个体差异有关。这些结果表明,这两个系统并非独立处理信息,而是在学习过程中相互作用。强化学习(RL)理论在提高我们对工具性学习以及许多哺乳动物物种的多巴胺能和纹状体网络功能的理解方面取得了显著成效。然而,这个神经网络只是人类学习的一个贡献因素,其他机制如前额叶皮层工作记忆也起着关键作用。我们的结果还表明,这些其他因素与多巴胺能RL系统相互作用,干扰其奖励预测误差的关键计算。