• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于整合多个皮质-纹状体环路的异层级强化学习模型:刺激-动作-奖励关联学习中的功能磁共振成像检查

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.

作者信息

Haruno Masahiko, Kawato Mitsuo

机构信息

ATR Computational Neuroscience Laboratories, Department of Computational Neurobiology, 2-2-2 Hikaridai, Soraku-gun, Kyoto, Japan.

出版信息

Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

DOI:10.1016/j.neunet.2006.06.007
PMID:16987637
Abstract

The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.

摘要

大脑在决策学习中最困难的计算是在大量多模态输入中寻找与奖励相关的关键信息,然后将其整合到有益行为中。由边缘系统、认知、视觉、听觉、躯体感觉和运动信号组成的情境线索需要通过利用诸如奖励预测和奖励预测误差等内部表征与奖励和行动相关联。先前的研究表明,适合这种整合的脑结构是与多个皮质-纹状体环路相关的神经回路。然而,关于这些多个闭环及其周围的信息如何共享和传递的计算探索仍在继续。在这里,我们提出了一种“异层级强化学习”模型,其中更多边缘系统和认知环路做出的奖励预测通过纹状体和黑质之间的螺旋投射传播到运动环路,并由投射到脚桥被盖核的皮质投射提供辅助,脚桥被盖核向黑质发送兴奋性输入。该模型对刺激-行动-奖励关联学习期间的大脑活动做出了几个可通过功能磁共振成像测试的预测。尾状核和认知皮质区域与奖励预测误差相关,而壳核和运动相关区域与刺激-行动依赖的奖励预测相关。此外,根据学习难度预测纹状体内的异质性活动模式,即当学习变得困难时,前内侧尾状核将与奖励预测误差更相关,而在容易学习时,后壳核将与刺激-行动依赖的奖励预测更相关。我们的功能磁共振成像结果表明,如所提出的模型所示,不同的皮质-纹状体环路在起作用。

相似文献

1
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.用于整合多个皮质-纹状体环路的异层级强化学习模型:刺激-动作-奖励关联学习中的功能磁共振成像检查
Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.
2
Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning.在刺激-动作-奖励关联学习过程中,壳核和尾状核中奖励期望与奖励期望误差的不同神经关联。
J Neurophysiol. 2006 Feb;95(2):948-59. doi: 10.1152/jn.00382.2005. Epub 2005 Sep 28.
3
Dorsal striatal-midbrain connectivity in humans predicts how reinforcements are used to guide decisions.人类背侧纹状体与中脑的连接性可预测强化如何用于指导决策。
J Cogn Neurosci. 2009 Jul;21(7):1332-45. doi: 10.1162/jocn.2009.21092.
4
Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.人类大脑在使用果汁和金钱奖励进行工具性学习过程中,背侧纹状体的预测误差存在重叠。
J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30.
5
Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics.可预测和不可预测环境动态下奖励预测的脑机制。
Neural Netw. 2006 Oct;19(8):1233-41. doi: 10.1016/j.neunet.2006.05.039. Epub 2006 Sep 18.
6
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
7
Spatial remapping of cortico-striatal connectivity in Parkinson's disease.帕金森病皮质纹状体连接的空间重映射。
Cereb Cortex. 2010 May;20(5):1175-86. doi: 10.1093/cercor/bhp178. Epub 2009 Aug 26.
8
[Cortico-basal ganglia circuits--parallel closed loops and convergent/divergent connections].[皮质-基底神经节环路——平行闭环与汇聚/发散连接]
Brain Nerve. 2009 Apr;61(4):351-9.
9
"Virus and epidemic": causal knowledge activates prediction error circuitry.“病毒与传染病”:因果知识激活预测误差回路。
J Cogn Neurosci. 2010 Oct;22(10):2151-63. doi: 10.1162/jocn.2009.21387.
10
Novelty increases the mesolimbic functional connectivity of the substantia nigra/ventral tegmental area (SN/VTA) during reward anticipation: Evidence from high-resolution fMRI.新颖性增加了奖励预期期间黑质/腹侧被盖区(SN/VTA)的中脑边缘功能连接:来自高分辨率 fMRI 的证据。
Neuroimage. 2011 Sep 15;58(2):647-55. doi: 10.1016/j.neuroimage.2011.06.038. Epub 2011 Jun 24.

引用本文的文献

1
Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.纹状体价值衰减梯度解释了多巴胺模式和强化学习计算中的区域差异。
J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.
2
Two Separate Brain Networks for Predicting Trainability and Tracking Training-Related Plasticity in Working Dogs.用于预测工作犬可训练性及追踪训练相关可塑性的两个独立脑网络。
Animals (Basel). 2024 Apr 2;14(7):1082. doi: 10.3390/ani14071082.
3
Dopamine transients follow a striatal gradient of reward time horizons.
多巴胺瞬变遵循纹状体奖赏时程的梯度。
Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.
4
Neural Correlates of Positive Outcome Expectancy for Aggression: Evidence from Voxel-Based Morphometry and Resting-State Functional Connectivity Analysis.攻击行为积极结果预期的神经关联:基于体素的形态测量和静息态功能连接分析的证据。
Brain Sci. 2023 Dec 31;14(1):43. doi: 10.3390/brainsci14010043.
5
Motor Cortex Response to Pleasant Odor Perception and Imagery: The Differential Role of Personality Dimensions and Imagery Ability.运动皮层对愉悦气味感知与想象的反应:人格维度与想象能力的不同作用
Front Hum Neurosci. 2022 Jul 12;16:943469. doi: 10.3389/fnhum.2022.943469. eCollection 2022.
6
Value signals guide abstraction during learning.价值信号指导学习中的抽象过程。
Elife. 2021 Jul 13;10:e68943. doi: 10.7554/eLife.68943.
7
Rigid reduced successor representation as a potential mechanism for addiction.刚性降低的后继者代表作为成瘾的潜在机制。
Eur J Neurosci. 2021 Jun;53(11):3768-3790. doi: 10.1111/ejn.15227. Epub 2021 May 10.
8
Computational evidence for hierarchically structured reinforcement learning in humans.人类强化学习的分层结构计算证据。
Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.
9
Modeling nucleus accumbens : A Computational Model from Single Cell to Circuit Level.伏隔核建模:从单细胞到电路水平的计算模型
J Comput Neurosci. 2021 Feb;49(1):21-35. doi: 10.1007/s10827-020-00769-y. Epub 2020 Nov 9.
10
Unconscious reinforcement learning of hidden brain states supported by confidence.基于置信度的隐藏脑状态无意识强化学习。
Nat Commun. 2020 Aug 31;11(1):4429. doi: 10.1038/s41467-020-17828-8.