计划的诅咒：通过征税中央执行系统来剖析多个强化学习系统。

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

机构信息

Department of Psychology, University of Texas at Austin, USA.

出版信息

Psychol Sci. 2013 May;24(5):751-61. doi: 10.1177/0956797612463080. Epub 2013 Apr 4.

DOI:10.1177/0956797612463080

PMID:23558545

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3843765/

Abstract

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

摘要

许多关于人类和动物行为的描述都假设，在控制选择行为时，存在并行且相互竞争的估值系统在起作用。在这些描述中，一个灵活但计算成本高的基于模型的强化学习系统与一个不那么灵活但更有效的无模型强化学习系统形成了对比。然而，控制哪个系统来控制行为的因素——以及在什么情况下——仍然不清楚。根据基于模型的强化学习需要认知资源的假设，我们证明了让人类决策者执行一项要求很高的次要任务会导致更多地依赖无模型强化学习策略。此外，我们还表明，在整个试验过程中，人们会根据并发执行功能需求，动态地协商两个系统之间的权衡取舍，并且人们的选择潜伏期反映了他们所采用策略的计算成本。这些结果表明，通过调节认知资源的可用性，可以在每次试验的基础上控制多个学习系统之间的竞争。

相似文献

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.计划的诅咒：通过征税中央执行系统来剖析多个强化学习系统。

Psychol Sci. 2013 May;24(5):751-61. doi: 10.1177/0956797612463080. Epub 2013 Apr 4.

Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

When Does Model-Based Control Pay Off?基于模型的控制何时能带来回报？

PLoS Comput Biol. 2016 Aug 26;12(8):e1005090. doi: 10.1371/journal.pcbi.1005090. eCollection 2016 Aug.

The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。

Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.

Working-memory capacity protects model-based learning from stress.工作记忆容量能保护基于模型的学习免受压力影响。

Proc Natl Acad Sci U S A. 2013 Dec 24;110(52):20941-6. doi: 10.1073/pnas.1312011110. Epub 2013 Dec 9.

Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.杏仁核与腹侧纹状体对强化学习有不同贡献。

Neuron. 2016 Oct 19;92(2):505-517. doi: 10.1016/j.neuron.2016.09.025. Epub 2016 Oct 6.

Cognitive control predicts use of model-based reinforcement learning.认知控制可预测基于模型的强化学习的使用情况。

J Cogn Neurosci. 2015 Feb;27(2):319-33. doi: 10.1162/jocn_a_00709.

Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.多强化学习系统的成本效益仲裁。

Psychol Sci. 2017 Sep;28(9):1321-1333. doi: 10.1177/0956797617708288. Epub 2017 Jul 21.

Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

People's intuitions about intuitive insight and intuitive choice.人们对直觉洞察和直觉选择的直觉。

J Pers Soc Psychol. 2010 Aug;99(2):232-47. doi: 10.1037/a0020215.

引用本文的文献

Forward Planning in a Population-Based Alcohol Use Disorder Sample.基于人群的酒精使用障碍样本中的前瞻性规划

Addict Biol. 2025 Aug;30(8):e70072. doi: 10.1111/adb.70072.

Model-based algorithms shape automatic evaluative processing.基于模型的算法塑造自动评价性加工。

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2417068122. doi: 10.1073/pnas.2417068122. Epub 2025 Jun 20.

Female rats retain goal-directed planning of action sequences after acute stress despite changes in planning structure and action sequence execution.雌性大鼠在急性应激后仍保留动作序列的目标导向性规划，尽管规划结构和动作序列执行发生了变化。

Neurobiol Learn Mem. 2025 Jul;220:108063. doi: 10.1016/j.nlm.2025.108063. Epub 2025 May 15.

Imbalanced goal-directed and habitual control in individuals with internet gaming disorder.患有网络游戏障碍的个体中目标导向控制和习惯控制失衡。

J Behav Addict. 2025 Apr 28;14(2):831-845. doi: 10.1556/2006.2025.00037. Print 2025 Jul 2.

Differentiating Reinforcement Learning and Episodic Memory in Value-Based Decisions in Parkinson's Disease.帕金森病基于价值决策中强化学习与情景记忆的区分

J Neurosci. 2025 May 21;45(21):e0911242025. doi: 10.1523/JNEUROSCI.0911-24.2025.

Evidence for shallow cognitive maps in Schizophrenia.精神分裂症中浅层认知地图的证据。

Cogn Affect Behav Neurosci. 2025 Mar 20. doi: 10.3758/s13415-025-01283-3.

Exploring Habits in Anorexia Nervosa: Promise, Pitfalls, and Progress.探索神经性厌食症的习惯：前景、陷阱与进展

Curr Psychiatry Rep. 2025 Apr;27(4):176-186. doi: 10.1007/s11920-025-01588-7. Epub 2025 Feb 28.

Dorsal-Ventral Reinforcement Learning Network Connectivity and Incentive-Driven Changes in Exploration.背腹侧强化学习网络连接性与探索中动机驱动的变化

J Neurosci. 2025 Apr 9;45(15):e0422242025. doi: 10.1523/JNEUROSCI.0422-24.2025.

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.两步序贯决策任务行为中持续重复和基于启发式的定向探索特征

Comput Psychiatr. 2025 Feb 11;9(1):39-62. doi: 10.5334/cpsy.101. eCollection 2025.

Negative affect-driven impulsivity as hierarchical model-based overgeneralization.基于层次模型的过度泛化的消极情感驱动冲动性。

Trends Cogn Sci. 2025 May;29(5):407-420. doi: 10.1016/j.tics.2025.01.002. Epub 2025 Feb 6.

本文引用的文献

Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

There are at least two kinds of probability matching: evidence from a secondary task.至少存在两种概率匹配：来自次要任务的证据。

Cognition. 2011 Feb;118(2):274-9. doi: 10.1016/j.cognition.2010.11.009. Epub 2010 Dec 8.

States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.状态与奖励：基于模型和无模型强化学习的分离神经预测误差信号。

Neuron. 2010 May 27;66(4):585-95. doi: 10.1016/j.neuron.2010.04.016.

Goal-directed control and its antipodes.目标导向控制及其对立面。

Neural Netw. 2009 Apr;22(3):213-9. doi: 10.1016/j.neunet.2009.03.004. Epub 2009 Mar 24.

Secondary-task effects on classification learning.二级任务对分类学习的影响。

Mem Cognit. 2007 Jul;35(5):864-74. doi: 10.3758/bf03193461.

Determining the neural substrates of goal-directed learning in the human brain.确定人类大脑中目标导向学习的神经基础。

J Neurosci. 2007 Apr 11;27(15):4019-26. doi: 10.1523/JNEUROSCI.0564-07.2007.

Modulation of competing memory systems by distraction.通过分心对相互竞争的记忆系统进行调节。

Proc Natl Acad Sci U S A. 2006 Aug 1;103(31):11778-83. doi: 10.1073/pnas.0602659103. Epub 2006 Jul 25.

Dual-task interference in perceptual category learning.感知类别学习中的双任务干扰。

Mem Cognit. 2006 Mar;34(2):387-98. doi: 10.3758/bf03193416.

The role of the basal ganglia in habit formation.基底神经节在习惯形成中的作用。

Nat Rev Neurosci. 2006 Jun;7(6):464-76. doi: 10.1038/nrn1919.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验