University of California, Berkeley.
J Cogn Neurosci. 2018 Oct;30(10):1422-1432. doi: 10.1162/jocn_a_01238. Epub 2018 Jan 18.
Learning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity-limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long-term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus participant choices depended on reinforcement learning. Counterintuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: Using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning and may have important applications for education and computational psychiatry.
学习根据刺激做出有益的选择取决于一个缓慢但稳定的过程,即强化学习,以及一个快速灵活但容量有限的过程,即工作记忆。并行使用这两个系统,并根据性能对它们的贡献进行加权,应该可以让我们充分利用每个系统的优势:快速的早期学习,辅以长期的稳健获取。然而,这假设使用一个系统不会干扰另一个系统。我们使用计算建模在一项行为实验中研究了这两个过程之间的相互作用,并表明工作记忆会干扰强化学习。先前的研究表明,当工作记忆用于学习时,奖励预测误差的神经表示(强化学习的一个关键标志物)会变得迟钝。因此,我们预测,在简单的问题上为了更快地学习而偏向工作记忆会削弱强化学习过程。我们通过在延迟测试阶段测量表现来检验这一点,在该阶段使用工作记忆是不可能的,因此参与者的选择取决于强化学习。出人意料的是,但证实了我们的预测,我们观察到最容易学习的关联保留得更差,而学习较慢的关联保留得更好:使用工作记忆快速学习是以长期保留为代价的。计算模型证实,这只能归因于强化学习计算中的工作记忆干扰。这些结果进一步加深了我们对多个系统如何并行贡献于人类学习的理解,并且可能对教育和计算精神病学有重要的应用。