强化学习中有多少是工作记忆，而不是强化学习？一项行为、计算和神经遗传学分析。

How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis.

机构信息

Department of Cognitive, Linguistic and Psychological Sciences, Brown Institute for Brain Science, Brown University, Providence, RI, USA.

出版信息

Eur J Neurosci. 2012 Apr;35(7):1024-35. doi: 10.1111/j.1460-9568.2011.07980.x.

DOI:10.1111/j.1460-9568.2011.07980.x

PMID:22487033

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3390186/

Abstract

Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models.

摘要

工具性学习涉及皮质纹状体回路和多巴胺能系统。该系统通常在强化学习 (RL) 框架中通过逐步积累状态和动作的奖励值来建模。然而，人类学习也涉及到参与更高层次认知功能的前额叶皮质机制。这些系统的相互作用仍然知之甚少，人类行为模型通常忽略工作记忆 (WM)，因此错误地将行为方差分配给 RL 系统。在这里，我们设计了一项任务，突出了这两个过程的深刻纠缠，即使在简单的学习问题中也是如此。通过系统地改变学习问题的大小和刺激重复之间的延迟，我们分别提取了 WM 特定的负载和延迟对学习的影响。我们提出了一个新的计算模型，该模型解释了观察到的受试者行为中 RL 和 WM 过程的动态整合。将容量有限的 WM 纳入模型中，使我们能够捕捉到即使在纯 RL 框架中，如果我们（不合理地）允许为每个集合大小设置单独的 RL 系统，也无法捕捉到的行为方差。WM 组件还允许对单个 RL 过程进行更合理的估计。最后，我们报告了两种遗传多态性的影响，这些多态性对前额叶和基底神经节功能具有相对特异性。编码儿茶酚-O-甲基转移酶的 COMT 基因选择性地影响 WM 容量的模型估计，而编码 G 蛋白偶联受体 6 的 GPR6 基因影响 RL 学习率。因此，这项研究使我们能够指定高水平和低水平认知功能对工具性学习的独特影响，而不仅仅是简单 RL 模型提供的可能性。

相似文献

How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis.强化学习中有多少是工作记忆，而不是强化学习？一项行为、计算和神经遗传学分析。

Eur J Neurosci. 2012 Apr;35(7):1024-35. doi: 10.1111/j.1460-9568.2011.07980.x.

Working memory contributions to reinforcement learning impairments in schizophrenia.工作记忆对精神分裂症强化学习障碍的影响

J Neurosci. 2014 Oct 8;34(41):13747-56. doi: 10.1523/JNEUROSCI.0989-14.2014.

Neural Index of Reinforcement Learning Predicts Improved Stimulus-Response Retention under High Working Memory Load.神经强化学习指数预测在高工作记忆负荷下改善刺激-反应保持。

J Neurosci. 2023 Apr 26;43(17):3131-3143. doi: 10.1523/JNEUROSCI.1274-22.2023. Epub 2023 Mar 17.

Distentangling the systems contributing to changes in learning during adolescence.厘清导致青少年学习变化的系统。

Dev Cogn Neurosci. 2020 Feb;41:100732. doi: 10.1016/j.dcn.2019.100732. Epub 2019 Nov 14.

Relevance of working memory for reinforcement learning in older adults varies with timescale of learning.工作记忆与老年人强化学习的相关性随学习时间尺度而异。

Neuropsychol Dev Cogn B Aging Neuropsychol Cogn. 2020 Sep;27(5):654-676. doi: 10.1080/13825585.2019.1664389. Epub 2019 Sep 22.

Working Memory Load Strengthens Reward Prediction Errors.工作记忆负荷增强奖励预测误差。

J Neurosci. 2017 Apr 19;37(16):4332-4342. doi: 10.1523/JNEUROSCI.2700-16.2017. Epub 2017 Mar 20.

Interactions Among Working Memory, Reinforcement Learning, and Effort in Value-Based Choice: A New Paradigm and Selective Deficits in Schizophrenia.基于价值的选择中工作记忆、强化学习和努力之间的相互作用：一种新范式及精神分裂症中的选择性缺陷。

Biol Psychiatry. 2017 Sep 15;82(6):431-439. doi: 10.1016/j.biopsych.2017.05.017. Epub 2017 May 31.

The neurocognitive role of working memory load when Pavlovian motivational control affects instrumental learning.当巴甫洛夫式动机控制影响工具性学习时，工作记忆负荷的神经认知作用。

PLoS Comput Biol. 2023 Dec 8;19(12):e1011692. doi: 10.1371/journal.pcbi.1011692. eCollection 2023 Dec.

Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning.多巴胺基因的变异性使基于模型和无模型的强化学习产生分离。

J Neurosci. 2016 Jan 27;36(4):1211-22. doi: 10.1523/JNEUROSCI.1901-15.2016.

Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

引用本文的文献

Neurocognitive Biotypes of Risk and Resilience for Mood Disorders in Adolescents: Insights From Behavioral and Graph-Theoretic Network Markers.青少年情绪障碍风险与恢复力的神经认知生物型：行为和图论网络标记的见解

Biol Psychiatry Glob Open Sci. 2025 Jul 8;5(6):100563. doi: 10.1016/j.bpsgos.2025.100563. eCollection 2025 Nov.

How working memory and reinforcement learning interact when avoiding punishment and pursuing reward concurrently.当同时避免惩罚和追求奖励时，工作记忆与强化学习是如何相互作用的。

J Exp Psychol Gen. 2025 Sep 1. doi: 10.1037/xge0001817.

Social inequity disrupts reward-based learning.社会不平等会扰乱基于奖励的学习。

Commun Psychol. 2025 Aug 16;3(1):125. doi: 10.1038/s44271-025-00300-y.

Model-based exploration is measurable across tasks but not linked to personality and psychiatric assessments.基于模型的探索在各项任务中是可测量的，但与人格和精神评估无关。

Sci Rep. 2025 Jul 28;15(1):27479. doi: 10.1038/s41598-025-09152-2.

Estimation-uncertainty affects decisions with and without learning opportunities.估计不确定性会影响有无学习机会情况下的决策。

Nat Commun. 2025 Jul 21;16(1):6706. doi: 10.1038/s41467-025-61960-2.

Methamphetamine-induced adaptation of learning rate dynamics depend on baseline performance.甲基苯丙胺引起的学习率动态适应性取决于基线表现。

Elife. 2025 Jul 21;13:RP101413. doi: 10.7554/eLife.101413.

Striatal dopamine can enhance both fast working memory, and slow reinforcement learning, while reducing implicit effort cost sensitivity.纹状体多巴胺可以增强快速工作记忆和缓慢强化学习，同时降低内隐努力成本敏感性。

Nat Commun. 2025 Jul 9;16(1):6320. doi: 10.1038/s41467-025-61099-0.

Discovering cognitive strategies with tiny recurrent neural networks.使用微型递归神经网络发现认知策略。

Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09142-4.

Basal ganglia deep brain stimulation restores cognitive flexibility and exploration-exploitation balance disrupted by NMDA-R antagonism.基底神经节深部脑刺激可恢复因NMDA受体拮抗作用而破坏的认知灵活性和探索-利用平衡。

Nat Commun. 2025 May 28;16(1):4963. doi: 10.1038/s41467-025-60044-5.

Dissociating Frontal Lobe Lesion Induced Deficits in Rule Value Learning Using Reinforcement Learning Models and a WCST Analog.使用强化学习模型和威斯康星卡片分类测验模拟法分离额叶病变引起的规则价值学习缺陷

eNeuro. 2025 May 20;12(5). doi: 10.1523/ENEURO.0117-25.2025. Print 2025 May.

本文引用的文献

The number and quality of representations in working memory.工作记忆中的表象数量和质量。

Psychol Sci. 2011 Nov;22(11):1434-41. doi: 10.1177/0956797611417006. Epub 2011 Oct 10.

A neural signature of hierarchical reinforcement learning.分层强化学习的神经特征。

Neuron. 2011 Jul 28;71(2):370-9. doi: 10.1016/j.neuron.2011.05.042.

Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.皮质纹状体电路中分层强化学习的机制 1：计算分析。

Cereb Cortex. 2012 Mar;22(3):509-26. doi: 10.1093/cercor/bhr114. Epub 2011 Jun 21.

Dopaminergic genes predict individual differences in susceptibility to confirmation bias.多巴胺能基因预测个体对确认偏误易感性的差异。

J Neurosci. 2011 Apr 20;31(16):6188-98. doi: 10.1523/JNEUROSCI.6486-10.2011.

Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices.纹状体和腹内侧前额叶皮层中的多巴胺介导的强化学习信号是基于价值的选择的基础。

J Neurosci. 2011 Feb 2;31(5):1606-13. doi: 10.1523/JNEUROSCI.3904-10.2011.

Keeping memory clear and stable--the contribution of human basal ganglia and prefrontal cortex to working memory.保持记忆清晰稳定——人类基底神经节和前额叶皮层对工作记忆的贡献。

J Neurosci. 2010 Jul 21;30(29):9788-92. doi: 10.1523/JNEUROSCI.1513-10.2010.

Neurogenetics and pharmacology of learning, motivation, and cognition.学习、动机和认知的神经遗传学和药理学。

Neuropsychopharmacology. 2011 Jan;36(1):133-52. doi: 10.1038/npp.2010.96. Epub 2010 Jul 14.

Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning.前额神经组合状态的突然转变伴随着规则学习过程中的行为转变。

Neuron. 2010 May 13;66(3):438-48. doi: 10.1016/j.neuron.2010.03.029.

The Magical Mystery Four: How is Working Memory Capacity Limited, and Why?神奇的谜团之四：工作记忆容量是如何受限的，以及为何受限？

Curr Dir Psychol Sci. 2010 Feb 1;19(1):51-57. doi: 10.1177/0963721409359277.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验