Suppr超能文献

混合仪器控制器:利用信息价值来结合习惯选择和心理模拟。

The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

机构信息

Istituto di Linguistica Computazionale, "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy ; Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy.

出版信息

Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.

Abstract

Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.

摘要

工具性行为取决于目标导向和习惯选择的机制。规范观点分别将这些机制描述为无模型和基于模型的强化学习方法。一个有影响力的假设提出,无模型和基于模型的机制根据其相对不确定性在大脑中共存和竞争。在本文中,我们提出了一种新的观点,即单个混合工具控制器通过灵活地平衡和组合基于模型和无模型的计算来产生目标导向和习惯行为。混合工具控制器执行成本效益分析,以决定是根据可用的“缓存”动作值(与无模型机制相关联)立即选择动作,还是通过心理模拟预期结果值来提高价值估计(与基于模型的机制相关联)。由于心理模拟需要认知努力并增加奖励延迟,因此仅在相关“信息价值”超过其成本时才会激活它。该模型提出了一种基于动作值的不确定性和替代缓存动作值的距离来计算信息价值的方法。总体而言,该模型默认基于较轻的无模型估计进行选择,并且仅在有用时才将其与昂贵的基于模型的预测相结合。心理模拟使用抽样方法产生奖励期望,这些期望用于更新一个或多个动作的缓存值;反过来,该更新的值用于选择。该模型的关键预测在双 T 迷宫场景的不同设置中进行了测试。结果与啮齿动物海马体 - 腹侧纹状体回路的神经生物学证据进行了讨论,该回路与目标导向的空间导航有关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/484908f9e1c2/fpsyg-04-00092-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验