Santoro Adam, Frankland Paul W, Richards Blake A
Institute of Medical Sciences, University of Toronto, Toronto, Ontario M5S 1AB, Canada.
Program in Neurosciences and Mental Health, Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada.
J Neurosci. 2016 Nov 30;36(48):12228-12242. doi: 10.1523/JNEUROSCI.0763-16.2016.
Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales.
As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales.
在系统巩固过程中,存在从依赖详细的情景记忆向概括性的图式记忆的转变。这种转变有时被称为“记忆转换”。在此,我们展示了记忆转换一个此前未被认识到的益处,即它在动态环境中增强强化学习的能力。我们开发了一个神经网络,该网络在觅食任务中接受训练以寻找奖励,其中奖励位置不断变化。该网络可以利用特定位置的记忆(情景记忆)和位置的统计模式(图式记忆)来指导其搜索。我们发现,随着时间的推移从情景策略转换为图式策略会导致性能提升,这是因为奖励位置在短期内倾向于与其自身高度相关,但在长期内会回归到稳定分布。我们还表明,环境的统计特性决定了这两种记忆类型的最佳利用方式。我们的工作重新诠释了记忆转换为何会发生的理论问题,将重点从避免记忆干扰转向在多个时间尺度上增强强化学习。
随着时间的推移,记忆会从高度详细的状态转变为更具梗概性的状态,这一过程被称为“记忆转换”。记忆转换理论阐述了其在减少记忆干扰、提高记忆稳健性以及构建环境模型方面的优势。然而,从在其环境中持续行动并接收奖励的智能体的角度来看,记忆转换的作用尚未得到充分探索。在这项工作中,我们展示了一种记忆转换的观点,将其定义为一种在多个时间尺度上优化行为的方式。