Haruno Masahiko, Kawato Mitsuo
ATR Computational Neuroscience Laboratories, Department of Computational Neurobiology, 2-2-2 Hikaridai, Soraku-gun, Kyoto, Japan.
Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.
The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.
大脑在决策学习中最困难的计算是在大量多模态输入中寻找与奖励相关的关键信息,然后将其整合到有益行为中。由边缘系统、认知、视觉、听觉、躯体感觉和运动信号组成的情境线索需要通过利用诸如奖励预测和奖励预测误差等内部表征与奖励和行动相关联。先前的研究表明,适合这种整合的脑结构是与多个皮质-纹状体环路相关的神经回路。然而,关于这些多个闭环及其周围的信息如何共享和传递的计算探索仍在继续。在这里,我们提出了一种“异层级强化学习”模型,其中更多边缘系统和认知环路做出的奖励预测通过纹状体和黑质之间的螺旋投射传播到运动环路,并由投射到脚桥被盖核的皮质投射提供辅助,脚桥被盖核向黑质发送兴奋性输入。该模型对刺激-行动-奖励关联学习期间的大脑活动做出了几个可通过功能磁共振成像测试的预测。尾状核和认知皮质区域与奖励预测误差相关,而壳核和运动相关区域与刺激-行动依赖的奖励预测相关。此外,根据学习难度预测纹状体内的异质性活动模式,即当学习变得困难时,前内侧尾状核将与奖励预测误差更相关,而在容易学习时,后壳核将与刺激-行动依赖的奖励预测更相关。我们的功能磁共振成像结果表明,如所提出的模型所示,不同的皮质-纹状体环路在起作用。