Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK.
Sainsbury Wellcome Centre, University College London, London, UK.
Nat Neurosci. 2024 Jul;27(7):1340-1348. doi: 10.1038/s41593-024-01675-7. Epub 2024 Jun 7.
When faced with a novel situation, people often spend substantial periods of time contemplating possible futures. For such planning to be rational, the benefits to behavior must compensate for the time spent thinking. Here, we capture these features of behavior by developing a neural network model where planning itself is controlled by the prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences from its own policy, which we call 'rollouts'. In a spatial navigation task, the agent learns to plan when it is beneficial, which provides a normative explanation for empirical variability in human thinking times. Additionally, the patterns of policy rollouts used by the artificial agent closely resemble patterns of rodent hippocampal replays. Our work provides a theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by-and adaptively affect-prefrontal dynamics.
当面对新情况时,人们通常会花费大量时间思考可能的未来。为了使这种规划具有合理性,行为的收益必须弥补思考所花费的时间。在这里,我们通过开发一个神经网络模型来捕捉行为的这些特征,其中规划本身由前额叶皮层控制。该模型由一个元强化学习代理组成,该代理具有通过从其自身策略中采样想象的动作序列进行规划的能力,我们称之为“rollouts”。在空间导航任务中,当代理受益时,它会学习进行规划,这为人类思考时间的经验可变性提供了一个规范解释。此外,人工代理使用的策略 rollout 模式与啮齿动物海马体重放的模式非常相似。我们的工作提供了一种理论,说明大脑如何通过前额叶-海马体相互作用来实现规划,其中海马体重放由前额叶动力学触发,并自适应地影响前额叶动力学。