• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

混合仪器控制器:利用信息价值来结合习惯选择和心理模拟。

The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

机构信息

Istituto di Linguistica Computazionale, "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy ; Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy.

出版信息

Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.

DOI:10.3389/fpsyg.2013.00092
PMID:23459512
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3586710/
Abstract

Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.

摘要

工具性行为取决于目标导向和习惯选择的机制。规范观点分别将这些机制描述为无模型和基于模型的强化学习方法。一个有影响力的假设提出,无模型和基于模型的机制根据其相对不确定性在大脑中共存和竞争。在本文中,我们提出了一种新的观点,即单个混合工具控制器通过灵活地平衡和组合基于模型和无模型的计算来产生目标导向和习惯行为。混合工具控制器执行成本效益分析,以决定是根据可用的“缓存”动作值(与无模型机制相关联)立即选择动作,还是通过心理模拟预期结果值来提高价值估计(与基于模型的机制相关联)。由于心理模拟需要认知努力并增加奖励延迟,因此仅在相关“信息价值”超过其成本时才会激活它。该模型提出了一种基于动作值的不确定性和替代缓存动作值的距离来计算信息价值的方法。总体而言,该模型默认基于较轻的无模型估计进行选择,并且仅在有用时才将其与昂贵的基于模型的预测相结合。心理模拟使用抽样方法产生奖励期望,这些期望用于更新一个或多个动作的缓存值;反过来,该更新的值用于选择。该模型的关键预测在双 T 迷宫场景的不同设置中进行了测试。结果与啮齿动物海马体 - 腹侧纹状体回路的神经生物学证据进行了讨论,该回路与目标导向的空间导航有关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/ab8e872af225/fpsyg-04-00092-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/484908f9e1c2/fpsyg-04-00092-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/3c7c24c77db7/fpsyg-04-00092-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/4a5345434594/fpsyg-04-00092-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/0d3aa0ef2903/fpsyg-04-00092-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/bf97ff7e596d/fpsyg-04-00092-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/277b79b31ef5/fpsyg-04-00092-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/e19250955164/fpsyg-04-00092-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/9bebe4299e57/fpsyg-04-00092-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/ab8e872af225/fpsyg-04-00092-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/484908f9e1c2/fpsyg-04-00092-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/3c7c24c77db7/fpsyg-04-00092-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/4a5345434594/fpsyg-04-00092-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/0d3aa0ef2903/fpsyg-04-00092-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/bf97ff7e596d/fpsyg-04-00092-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/277b79b31ef5/fpsyg-04-00092-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/e19250955164/fpsyg-04-00092-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/9bebe4299e57/fpsyg-04-00092-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16b/3586710/ab8e872af225/fpsyg-04-00092-g009.jpg

相似文献

1
The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.混合仪器控制器:利用信息价值来结合习惯选择和心理模拟。
Front Psychol. 2013 Mar 4;4:92. doi: 10.3389/fpsyg.2013.00092. eCollection 2013.
2
Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making.利用海马体-纹状体环路进行空间导航和目标导向决策。
Cogn Process. 2012 Aug;13 Suppl 1:S125-9. doi: 10.1007/s10339-012-0475-7.
3
Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis.基于模型的海马-腹侧纹状体回路中的空间导航:计算分析。
PLoS Comput Biol. 2018 Sep 17;14(9):e1006316. doi: 10.1371/journal.pcbi.1006316. eCollection 2018 Sep.
4
Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。
PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.
5
NLM-HS: Navigation Learning Model Based on a Hippocampal-Striatal Circuit for Explaining Navigation Mechanisms in Animal Brains.NLM-HS:基于海马体-纹状体回路的导航学习模型,用于解释动物大脑中的导航机制。
Brain Sci. 2021 Jun 17;11(6):803. doi: 10.3390/brainsci11060803.
6
Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum.将习惯适应性地整合到深度受限的规划中定义了一个习惯-目标导向频谱。
Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):12868-12873. doi: 10.1073/pnas.1609094113. Epub 2016 Oct 24.
7
Habits, action sequences and reinforcement learning.习惯、动作序列和强化学习。
Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.
8
Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies.整合皮质-边缘-基底神经节结构以学习基于模型和无模型的导航策略。
Front Behav Neurosci. 2012 Nov 27;6:79. doi: 10.3389/fnbeh.2012.00079. eCollection 2012.
9
The orbitofrontal cortex, predicted value, and choice.眶额皮质、预测值和选择。
Ann N Y Acad Sci. 2011 Dec;1239:43-50. doi: 10.1111/j.1749-6632.2011.06270.x.
10
Human Choice Strategy Varies with Anatomical Projections from Ventromedial Prefrontal Cortex to Medial Striatum.人类选择策略随腹内侧前额叶皮质到内侧纹状体的解剖投射而变化。
J Neurosci. 2016 Mar 9;36(10):2857-67. doi: 10.1523/JNEUROSCI.2033-15.2016.

引用本文的文献

1
Adaptive planning depth in human problem-solving.人类问题解决中的适应性规划深度。
R Soc Open Sci. 2025 Apr 9;12(4):241161. doi: 10.1098/rsos.241161. eCollection 2025 Apr.
2
Noradrenergic and Dopaminergic modulation of meta-cognition and meta-control.去甲肾上腺素能和多巴胺能对元认知和元控制的调节。
PLoS Comput Biol. 2025 Feb 26;21(2):e1012675. doi: 10.1371/journal.pcbi.1012675. eCollection 2025 Feb.
3
A dopaminergic basis of behavioral control.行为控制的多巴胺能基础。

本文引用的文献

1
Goal-directed decision making in prefrontal cortex: A computational framework.前额叶皮质中的目标导向决策:一个计算框架。
Adv Neural Inf Process Syst. 2009;21:169-176.
2
A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning.用于目标导向和习惯动作学习的皮质基底神经节电路的尖峰神经元模型。
Neural Netw. 2013 May;41:212-24. doi: 10.1016/j.neunet.2012.11.009. Epub 2012 Dec 5.
3
Retrospective revaluation in sequential decision making: a tale of two systems.序贯决策中的回溯再评估:两个系统的故事。
bioRxiv. 2024 Oct 2:2024.09.17.613524. doi: 10.1101/2024.09.17.613524.
4
Distinct value computations support rapid sequential decisions.不同的值计算支持快速连续决策。
Nat Commun. 2023 Nov 21;14(1):7573. doi: 10.1038/s41467-023-43250-x.
5
Habit formation viewed as structural change in the behavioral network.习惯形成被视为行为网络中的结构性变化。
Commun Biol. 2023 Apr 4;6(1):303. doi: 10.1038/s42003-023-04500-2.
6
Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior.不同的皮质-纹状体隔室驱动适应性和自动化行为之间的竞争。
PLoS One. 2023 Mar 21;18(3):e0279841. doi: 10.1371/journal.pone.0279841. eCollection 2023.
7
The 3Ps: A tool for coach observation.3P原则:教练观察工具
Front Sports Act Living. 2023 Jan 20;4:1066378. doi: 10.3389/fspor.2022.1066378. eCollection 2022.
8
Importance of prefrontal meta control in human-like reinforcement learning.前额叶元控制在类人强化学习中的重要性。
Front Comput Neurosci. 2022 Dec 21;16:1060101. doi: 10.3389/fncom.2022.1060101. eCollection 2022.
9
Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics.神经机器人学中强化学习的基于模型和无模型回放机制
Front Neurorobot. 2022 Jun 24;16:864380. doi: 10.3389/fnbot.2022.864380. eCollection 2022.
10
Freezing revisited: coordinated autonomic and central optimization of threat coping.重温冻结:协调自主和中枢优化威胁应对。
Nat Rev Neurosci. 2022 Sep;23(9):568-580. doi: 10.1038/s41583-022-00608-2. Epub 2022 Jun 27.
J Exp Psychol Gen. 2014 Feb;143(1):182-94. doi: 10.1037/a0030844. Epub 2012 Dec 10.
4
The future of memory: remembering, imagining, and the brain.记忆的未来:记忆、想象与大脑。
Neuron. 2012 Nov 21;76(4):677-94. doi: 10.1016/j.neuron.2012.11.001.
5
Aversive pavlovian responses affect human instrumental motor performance.厌恶条件反射反应会影响人类工具性运动表现。
Front Neurosci. 2012 Oct 8;6:134. doi: 10.3389/fnins.2012.00134. eCollection 2012.
6
The basal ganglia optimize decision making over general perceptual hypotheses.基底神经节优化基于一般感知假设的决策。
Neural Comput. 2012 Nov;24(11):2924-45. doi: 10.1162/NECO_a_00360. Epub 2012 Aug 24.
7
Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making.利用海马体-纹状体环路进行空间导航和目标导向决策。
Cogn Process. 2012 Aug;13 Suppl 1:S125-9. doi: 10.1007/s10339-012-0475-7.
8
Making decisions through a distributed consensus.通过分布式共识做出决策。
Curr Opin Neurobiol. 2012 Dec;22(6):927-36. doi: 10.1016/j.conb.2012.05.007. Epub 2012 Jun 8.
9
Information processing in decision-making systems.决策系统中的信息处理。
Neuroscientist. 2012 Aug;18(4):342-59. doi: 10.1177/1073858411435128. Epub 2012 Apr 9.
10
Habits, action sequences and reinforcement learning.习惯、动作序列和强化学习。
Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.