Suppr超能文献

人类在空间决策任务中进行前瞻性规划的神经关联。

Neural correlates of forward planning in a spatial decision task in humans.

机构信息

Department of Psychology, Center for Neural Science, New York University, New York, New York 10003, USA.

出版信息

J Neurosci. 2011 Apr 6;31(14):5526-39. doi: 10.1523/JNEUROSCI.4647-10.2011.

Abstract

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

摘要

虽然强化学习 (RL) 理论在描述大脑中奖励引导选择的机制方面具有影响力,但占主导地位的时间差分 (TD) 算法无法解释许多已经在行为上证明的灵活或目标导向的行为。我们通过对比基于模型的 RL 算法与传统的无模型 TD 学习来研究这些行为,因为前者依赖于学习任务的地图或模型并在其中进行规划。为了在人类中区分这些方法,我们在连续空间导航任务中使用了功能磁共振成像,其中迷宫的布局经常发生变化,迫使受试者不断重新学习他们喜欢的路线,从而暴露了所使用的 RL 机制。我们通过将选择行为和血氧水平依赖 (BOLD) 信号与从两种算法的模拟中提取的决策变量进行比较,来寻找这些机制的神经基础的证据。纹状体中的选择和与价值相关的 BOLD 信号虽然最常与 TD 学习相关,但通过基于模型的理论得到了更好的解释。此外,基于模型的价值计算的前体数量与内侧颞叶和额叶皮层中的 BOLD 信号相关。这些结果表明,大脑中的 RL 在计算和解剖学基础方面都有了重大扩展。

相似文献

2
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
7
Behavioral and neural predictors of upcoming decisions.即将做出决策的行为和神经预测因素。
Cogn Affect Behav Neurosci. 2005 Jun;5(2):117-26. doi: 10.3758/cabn.5.2.117.

引用本文的文献

本文引用的文献

4
Role of striatum in updating values of chosen actions.纹状体在更新所选动作价值中的作用。
J Neurosci. 2009 Nov 25;29(47):14701-12. doi: 10.1523/JNEUROSCI.2728-09.2009.
5
Neural computations underlying action-based decision making in the human brain.人类大脑基于行动的决策的神经计算。
Proc Natl Acad Sci U S A. 2009 Oct 6;106(40):17199-204. doi: 10.1073/pnas.0901077106. Epub 2009 Sep 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验