用于整合多个皮质-纹状体环路的异层级强化学习模型：刺激-动作-奖励关联学习中的功能磁共振成像检查

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.

作者信息

Haruno Masahiko, Kawato Mitsuo

机构信息

ATR Computational Neuroscience Laboratories, Department of Computational Neurobiology, 2-2-2 Hikaridai, Soraku-gun, Kyoto, Japan.

出版信息

Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

DOI:10.1016/j.neunet.2006.06.007

PMID:16987637

Abstract

The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.

摘要

大脑在决策学习中最困难的计算是在大量多模态输入中寻找与奖励相关的关键信息，然后将其整合到有益行为中。由边缘系统、认知、视觉、听觉、躯体感觉和运动信号组成的情境线索需要通过利用诸如奖励预测和奖励预测误差等内部表征与奖励和行动相关联。先前的研究表明，适合这种整合的脑结构是与多个皮质-纹状体环路相关的神经回路。然而，关于这些多个闭环及其周围的信息如何共享和传递的计算探索仍在继续。在这里，我们提出了一种“异层级强化学习”模型，其中更多边缘系统和认知环路做出的奖励预测通过纹状体和黑质之间的螺旋投射传播到运动环路，并由投射到脚桥被盖核的皮质投射提供辅助，脚桥被盖核向黑质发送兴奋性输入。该模型对刺激-行动-奖励关联学习期间的大脑活动做出了几个可通过功能磁共振成像测试的预测。尾状核和认知皮质区域与奖励预测误差相关，而壳核和运动相关区域与刺激-行动依赖的奖励预测相关。此外，根据学习难度预测纹状体内的异质性活动模式，即当学习变得困难时，前内侧尾状核将与奖励预测误差更相关，而在容易学习时，后壳核将与刺激-行动依赖的奖励预测更相关。我们的功能磁共振成像结果表明，如所提出的模型所示，不同的皮质-纹状体环路在起作用。

相似文献

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.

Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning.

J Neurophysiol. 2006 Feb;95(2):948-59. doi: 10.1152/jn.00382.2005. Epub 2005 Sep 28.

Dorsal striatal-midbrain connectivity in humans predicts how reinforcements are used to guide decisions.

J Cogn Neurosci. 2009 Jul;21(7):1332-45. doi: 10.1162/jocn.2009.21092.

Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.

J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30.

Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics.

Neural Netw. 2006 Oct;19(8):1233-41. doi: 10.1016/j.neunet.2006.05.039. Epub 2006 Sep 18.

Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.

Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.

Spatial remapping of cortico-striatal connectivity in Parkinson's disease.

Cereb Cortex. 2010 May;20(5):1175-86. doi: 10.1093/cercor/bhp178. Epub 2009 Aug 26.

[Cortico-basal ganglia circuits--parallel closed loops and convergent/divergent connections].

Brain Nerve. 2009 Apr;61(4):351-9.

"Virus and epidemic": causal knowledge activates prediction error circuitry.

J Cogn Neurosci. 2010 Oct;22(10):2151-63. doi: 10.1162/jocn.2009.21387.

Novelty increases the mesolimbic functional connectivity of the substantia nigra/ventral tegmental area (SN/VTA) during reward anticipation: Evidence from high-resolution fMRI.

Neuroimage. 2011 Sep 15;58(2):647-55. doi: 10.1016/j.neuroimage.2011.06.038. Epub 2011 Jun 24.

引用本文的文献

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Two Separate Brain Networks for Predicting Trainability and Tracking Training-Related Plasticity in Working Dogs.

Animals (Basel). 2024 Apr 2;14(7):1082. doi: 10.3390/ani14071082.

Dopamine transients follow a striatal gradient of reward time horizons.

Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.

Neural Correlates of Positive Outcome Expectancy for Aggression: Evidence from Voxel-Based Morphometry and Resting-State Functional Connectivity Analysis.

Brain Sci. 2023 Dec 31;14(1):43. doi: 10.3390/brainsci14010043.

Motor Cortex Response to Pleasant Odor Perception and Imagery: The Differential Role of Personality Dimensions and Imagery Ability.

Front Hum Neurosci. 2022 Jul 12;16:943469. doi: 10.3389/fnhum.2022.943469. eCollection 2022.

Value signals guide abstraction during learning.

Elife. 2021 Jul 13;10:e68943. doi: 10.7554/eLife.68943.

Rigid reduced successor representation as a potential mechanism for addiction.

Eur J Neurosci. 2021 Jun;53(11):3768-3790. doi: 10.1111/ejn.15227. Epub 2021 May 10.

Computational evidence for hierarchically structured reinforcement learning in humans.

Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.

Modeling nucleus accumbens : A Computational Model from Single Cell to Circuit Level.

J Comput Neurosci. 2021 Feb;49(1):21-35. doi: 10.1007/s10827-020-00769-y. Epub 2020 Nov 9.

Unconscious reinforcement learning of hidden brain states supported by confidence.

Nat Commun. 2020 Aug 31;11(1):4429. doi: 10.1038/s41467-020-17828-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于整合多个皮质-纹状体环路的异层级强化学习模型：刺激-动作-奖励关联学习中的功能磁共振成像检查

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献