使用基于模型的双向搜索对清醒海马体重新激活进行建模。

Modeling awake hippocampal reactivations with model-based bidirectional search.

作者信息

Khamassi Mehdi, Girard Benoît

机构信息

Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France.

出版信息

Biol Cybern. 2020 Apr;114(2):231-248. doi: 10.1007/s00422-020-00817-x. Epub 2020 Feb 17.

DOI:10.1007/s00422-020-00817-x

PMID:32065253

Abstract

Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal's performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo-prefronto-striatal network in learning.

摘要

在基于奖励的学习过程中，海马体的离线再激活通常被归类为回放事件，已被发现对于随着时间推移提高表现以及记忆巩固很重要。最近的计算工作已将这些现象与将奖励信息转化为用于决策的状态-动作值并将其传播到环境的所有相关状态的需求联系起来。然而，尚不清楚一种整合的强化学习机制是否能够解释清醒时海马体再激活的多样性，包括顺序的多样性（正向和反向再激活轨迹）以及它们发生位置的多样性（奖励位点或决策点）。在此，我们提出一种基于模型的双向搜索模型，该模型可以解释多种海马体再激活现象。该模型结合了从当前位置进行的正向轨迹采样以及通过从与大奖励预测误差相关的状态进行优先扫描的反向采样，直到两条轨迹连接。重复此过程直到状态-动作值稳定（收敛），这可以解释为什么当动物的表现稳定时海马体再激活会急剧减少。在多重T型迷宫任务中的模拟表明，正向再激活主要出现在决策点，而反向再激活仅在奖励位点产生。最后，该模型可以生成在任务执行期间主体不被允许的虚构轨迹。我们提出了一些实验预测以及对未来关于海马体-前额叶-纹状体网络在学习中的作用研究的启示。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用基于模型的双向搜索对清醒海马体重新激活进行建模。

Modeling awake hippocampal reactivations with model-based bidirectional search.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

使用基于模型的双向搜索对清醒海马体重新激活进行建模。

Modeling awake hippocampal reactivations with model-based bidirectional search.

作者信息

机构信息

出版信息

相似文献

引用本文的文献