混合仪器控制器：利用信息价值来结合习惯选择和心理模拟。

The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

机构信息

Istituto di Linguistica Computazionale, "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy ; Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy.

出版信息

Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.

DOI:10.3389/fpsyg.2013.00092

PMID:23459512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3586710/

Abstract

Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.

摘要

工具性行为取决于目标导向和习惯选择的机制。规范观点分别将这些机制描述为无模型和基于模型的强化学习方法。一个有影响力的假设提出，无模型和基于模型的机制根据其相对不确定性在大脑中共存和竞争。在本文中，我们提出了一种新的观点，即单个混合工具控制器通过灵活地平衡和组合基于模型和无模型的计算来产生目标导向和习惯行为。混合工具控制器执行成本效益分析，以决定是根据可用的“缓存”动作值（与无模型机制相关联）立即选择动作，还是通过心理模拟预期结果值来提高价值估计（与基于模型的机制相关联）。由于心理模拟需要认知努力并增加奖励延迟，因此仅在相关“信息价值”超过其成本时才会激活它。该模型提出了一种基于动作值的不确定性和替代缓存动作值的距离来计算信息价值的方法。总体而言，该模型默认基于较轻的无模型估计进行选择，并且仅在有用时才将其与昂贵的基于模型的预测相结合。心理模拟使用抽样方法产生奖励期望，这些期望用于更新一个或多个动作的缓存值；反过来，该更新的值用于选择。该模型的关键预测在双 T 迷宫场景的不同设置中进行了测试。结果与啮齿动物海马体 - 腹侧纹状体回路的神经生物学证据进行了讨论，该回路与目标导向的空间导航有关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/484908f9e1c2/fpsyg-04-00092-g001.jpg

相似文献

The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.

Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making.

Cogn Process. 2012 Aug;13 Suppl 1:S125-9. doi: 10.1007/s10339-012-0475-7.

Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis.

PLoS Comput Biol. 2018 Sep 17;14(9):e1006316. doi: 10.1371/journal.pcbi.1006316. eCollection 2018 Sep.

Speed/accuracy trade-off between the habitual and the goal-directed processes.

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

NLM-HS: Navigation Learning Model Based on a Hippocampal-Striatal Circuit for Explaining Navigation Mechanisms in Animal Brains.

Brain Sci. 2021 Jun 17;11(6):803. doi: 10.3390/brainsci11060803.

Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum.

Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):12868-12873. doi: 10.1073/pnas.1609094113. Epub 2016 Oct 24.

Habits, action sequences and reinforcement learning.

Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.

Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies.

Front Behav Neurosci. 2012 Nov 27;6:79. doi: 10.3389/fnbeh.2012.00079. eCollection 2012.

The orbitofrontal cortex, predicted value, and choice.

Ann N Y Acad Sci. 2011 Dec;1239:43-50. doi: 10.1111/j.1749-6632.2011.06270.x.

Human Choice Strategy Varies with Anatomical Projections from Ventromedial Prefrontal Cortex to Medial Striatum.

J Neurosci. 2016 Mar 9;36(10):2857-67. doi: 10.1523/JNEUROSCI.2033-15.2016.

引用本文的文献

Adaptive planning depth in human problem-solving.

R Soc Open Sci. 2025 Apr 9;12(4):241161. doi: 10.1098/rsos.241161. eCollection 2025 Apr.

Noradrenergic and Dopaminergic modulation of meta-cognition and meta-control.

PLoS Comput Biol. 2025 Feb 26;21(2):e1012675. doi: 10.1371/journal.pcbi.1012675. eCollection 2025 Feb.

A dopaminergic basis of behavioral control.

bioRxiv. 2024 Oct 2:2024.09.17.613524. doi: 10.1101/2024.09.17.613524.

Distinct value computations support rapid sequential decisions.

Nat Commun. 2023 Nov 21;14(1):7573. doi: 10.1038/s41467-023-43250-x.

Habit formation viewed as structural change in the behavioral network.

Commun Biol. 2023 Apr 4;6(1):303. doi: 10.1038/s42003-023-04500-2.

Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior.

PLoS One. 2023 Mar 21;18(3):e0279841. doi: 10.1371/journal.pone.0279841. eCollection 2023.

The 3Ps: A tool for coach observation.

Front Sports Act Living. 2023 Jan 20;4:1066378. doi: 10.3389/fspor.2022.1066378. eCollection 2022.

Importance of prefrontal meta control in human-like reinforcement learning.

Front Comput Neurosci. 2022 Dec 21;16:1060101. doi: 10.3389/fncom.2022.1060101. eCollection 2022.

Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics.

Front Neurorobot. 2022 Jun 24;16:864380. doi: 10.3389/fnbot.2022.864380. eCollection 2022.

Freezing revisited: coordinated autonomic and central optimization of threat coping.

Nat Rev Neurosci. 2022 Sep;23(9):568-580. doi: 10.1038/s41583-022-00608-2. Epub 2022 Jun 27.

本文引用的文献

Goal-directed decision making in prefrontal cortex: A computational framework.

Adv Neural Inf Process Syst. 2009;21:169-176.

A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning.

Neural Netw. 2013 May;41:212-24. doi: 10.1016/j.neunet.2012.11.009. Epub 2012 Dec 5.

Retrospective revaluation in sequential decision making: a tale of two systems.

J Exp Psychol Gen. 2014 Feb;143(1):182-94. doi: 10.1037/a0030844. Epub 2012 Dec 10.

The future of memory: remembering, imagining, and the brain.

Neuron. 2012 Nov 21;76(4):677-94. doi: 10.1016/j.neuron.2012.11.001.

Aversive pavlovian responses affect human instrumental motor performance.

Front Neurosci. 2012 Oct 8;6:134. doi: 10.3389/fnins.2012.00134. eCollection 2012.

The basal ganglia optimize decision making over general perceptual hypotheses.

Neural Comput. 2012 Nov;24(11):2924-45. doi: 10.1162/NECO_a_00360. Epub 2012 Aug 24.

Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making.

Cogn Process. 2012 Aug;13 Suppl 1:S125-9. doi: 10.1007/s10339-012-0475-7.

Making decisions through a distributed consensus.

Curr Opin Neurobiol. 2012 Dec;22(6):927-36. doi: 10.1016/j.conb.2012.05.007. Epub 2012 Jun 8.

Information processing in decision-making systems.

Neuroscientist. 2012 Aug;18(4):342-59. doi: 10.1177/1073858411435128. Epub 2012 Apr 9.

Habits, action sequences and reinforcement learning.

Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

混合仪器控制器：利用信息价值来结合习惯选择和心理模拟。

The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献