基于理论的强化学习的神经结构。

The neural architecture of theory-based reinforcement learning.

机构信息

Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Motional AD, Inc., Boston, MA 02210, USA.

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Neuron. 2023 Apr 19;111(8):1331-1344.e8. doi: 10.1016/j.neuron.2023.01.023. Epub 2023 Mar 9.

DOI:10.1016/j.neuron.2023.01.023

PMID:36898374

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10200004/

Abstract

Humans learn internal models of the world that support planning and generalization in complex environments. Yet it remains unclear how such internal models are represented and learned in the brain. We approach this question using theory-based reinforcement learning, a strong form of model-based reinforcement learning in which the model is a kind of intuitive theory. We analyzed fMRI data from human participants learning to play Atari-style games. We found evidence of theory representations in prefrontal cortex and of theory updating in prefrontal cortex, occipital cortex, and fusiform gyrus. Theory updates coincided with transient strengthening of theory representations. Effective connectivity during theory updating suggests that information flows from prefrontal theory-coding regions to posterior theory-updating regions. Together, our results are consistent with a neural architecture in which top-down theory representations originating in prefrontal regions shape sensory predictions in visual areas, where factored theory prediction errors are computed and trigger bottom-up updates of the theory.

摘要

人类学习内部的世界模型，以支持在复杂环境中的规划和泛化。然而，目前尚不清楚大脑中是如何表示和学习这些内部模型的。我们使用基于理论的强化学习来解决这个问题，这是一种强有力的基于模型的强化学习形式，其中模型是一种直观的理论。我们分析了人类参与者学习玩 Atari 风格游戏的 fMRI 数据。我们在额皮质和枕叶皮质以及梭状回中发现了理论表示的证据，以及在额皮质、枕叶皮质和梭状回中发现了理论更新的证据。理论更新与理论表示的短暂增强同时发生。理论更新期间的有效连通性表明，信息从额皮质的理论编码区域流向后部的理论更新区域。总的来说，我们的结果与一种神经架构一致，即源自额皮质区域的自上而下的理论表示塑造了视觉区域的感觉预测，在这些区域中计算了因子化的理论预测误差，并触发了理论的自下而上更新。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb73/10200004/48d4313e5e64/nihms-1882586-f0001.jpg

相似文献

The neural architecture of theory-based reinforcement learning.基于理论的强化学习的神经结构。

Neuron. 2023 Apr 19;111(8):1331-1344.e8. doi: 10.1016/j.neuron.2023.01.023. Epub 2023 Mar 9.

Causal Inference Gates Corticostriatal Learning.因果推理门控皮质纹状体学习。

J Neurosci. 2021 Aug 11;41(32):6892-6904. doi: 10.1523/JNEUROSCI.2796-20.2021. Epub 2021 Jul 9.

The Neural Correlates of Hierarchical Predictions for Perceptual Decisions.层级预测对知觉决策的神经关联。

J Neurosci. 2018 May 23;38(21):5008-5021. doi: 10.1523/JNEUROSCI.2901-17.2018. Epub 2018 Apr 30.

Dynamic Flexibility in Striatal-Cortical Circuits Supports Reinforcement Learning.纹状体-皮层回路中的动态灵活性支持强化学习。

J Neurosci. 2018 Mar 7;38(10):2442-2453. doi: 10.1523/JNEUROSCI.2084-17.2018. Epub 2018 Feb 5.

Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems.内嗅皮层和腹内侧前额叶皮层抽象和概括了强化学习问题的结构。

Neuron. 2021 Feb 17;109(4):713-723.e7. doi: 10.1016/j.neuron.2020.11.024. Epub 2020 Dec 22.

Multiple associative structures created by reinforcement and incidental statistical learning mechanisms.强化和偶然统计学习机制创建的多个联想结构。

Nat Commun. 2019 Oct 23;10(1):4835. doi: 10.1038/s41467-019-12557-z.

Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments.利用深度强化学习揭示大脑如何在高维环境中对抽象状态空间表示进行编码。

Neuron. 2021 Feb 17;109(4):724-738.e7. doi: 10.1016/j.neuron.2020.11.021. Epub 2020 Dec 15.

The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code.反事实信息对内侧前额叶和扣带回皮层结果价值编码的影响：从绝对神经编码到相对神经编码。

J Neurosci. 2020 Apr 15;40(16):3268-3277. doi: 10.1523/JNEUROSCI.1712-19.2020. Epub 2020 Mar 10.

Effective connectivity during haptic perception: a study using Granger causality analysis of functional magnetic resonance imaging data.触觉感知过程中的有效连接性：一项使用功能磁共振成像数据的格兰杰因果分析的研究。

Neuroimage. 2008 May 1;40(4):1807-14. doi: 10.1016/j.neuroimage.2008.01.044. Epub 2008 Feb 9.

Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia.前额叶皮层对安慰剂镇痛中纹状体预测误差的抑制作用

J Neurosci. 2017 Oct 4;37(40):9715-9723. doi: 10.1523/JNEUROSCI.1101-17.2017. Epub 2017 Sep 7.

引用本文的文献

Electrical brain activations in preadolescents during a probabilistic reward-learning task reflect cognitive processes and behavior strategies.青春期前儿童在概率性奖励学习任务中的脑电激活反映了认知过程和行为策略。

Front Hum Neurosci. 2025 Jan 30;19:1460584. doi: 10.3389/fnhum.2025.1460584. eCollection 2025.

Midbrain signaling of identity prediction errors depends on orbitofrontal cortex networks.中脑对身份预测误差的信号取决于眶额皮质网络。

Nat Commun. 2024 Feb 24;15(1):1704. doi: 10.1038/s41467-024-45880-1.

Dynamic computational phenotyping of human cognition.人类认知的动态计算表型分析。

Nat Hum Behav. 2024 May;8(5):917-931. doi: 10.1038/s41562-024-01814-x. Epub 2024 Feb 8.

Naturalistic reinforcement learning.自然强化学习。

Trends Cogn Sci. 2024 Feb;28(2):144-158. doi: 10.1016/j.tics.2023.08.016. Epub 2023 Sep 29.

本文引用的文献

Inductive biases in theory-based reinforcement learning.基于理论的强化学习中的归纳偏差。

Cogn Psychol. 2022 Nov;138:101509. doi: 10.1016/j.cogpsych.2022.101509. Epub 2022 Sep 21.

The neural architecture of language: Integrative modeling converges on predictive processing.语言的神经结构：综合建模趋向于预测处理。

Proc Natl Acad Sci U S A. 2021 Nov 9;118(45). doi: 10.1073/pnas.2105646118.

Causal Inference Gates Corticostriatal Learning.因果推理门控皮质纹状体学习。

J Neurosci. 2021 Aug 11;41(32):6892-6904. doi: 10.1523/JNEUROSCI.2796-20.2021. Epub 2021 Jul 9.

Is Activity Silent Working Memory Simply Episodic Memory?活动型工作记忆是否仅仅是情景记忆？

Trends Cogn Sci. 2021 Apr;25(4):284-293. doi: 10.1016/j.tics.2021.01.003. Epub 2021 Feb 4.

What Is the Model in Model-Based Planning?基于模型规划中的模型是什么？

Cogn Sci. 2021 Jan;45(1):e12928. doi: 10.1111/cogs.12928.

Mastering Atari, Go, chess and shogi by planning with a learned model.通过使用学习模型进行规划，掌握 Atari、围棋、国际象棋和将棋。

Nature. 2020 Dec;588(7839):604-609. doi: 10.1038/s41586-020-03051-4. Epub 2020 Dec 23.

Neuron. 2021 Feb 17;109(4):724-738.e7. doi: 10.1016/j.neuron.2020.11.021. Epub 2020 Dec 15.

Dissociable neural correlates of uncertainty underlie different exploration strategies.不同探索策略的不确定性基础上存在可分离的神经关联。

Nat Commun. 2020 May 12;11(1):2371. doi: 10.1038/s41467-020-15766-z.

Structured Event Memory: A neuro-symbolic model of event cognition.结构化事件记忆：事件认知的一种神经符号模型。

Psychol Rev. 2020 Apr;127(3):327-361. doi: 10.1037/rev0000177.

Automated anatomical labelling atlas 3.自动解剖学标注图谱 3.

Neuroimage. 2020 Feb 1;206:116189. doi: 10.1016/j.neuroimage.2019.116189. Epub 2019 Sep 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验