Suppr超能文献

迈向强化学习中的自动化:一项基于模型的功能磁共振成像研究。

Towards Automaticity in Reinforcement Learning: A Model-Based Functional Magnetic Resonance Imaging Study.

作者信息

Erdeniz Burak, Done John

机构信息

Department of Psychology, İzmir University of Economics, İzmir, Turkey.

Department of Psychology and Sports Sciences, School of Life and Medical Sciences, University of Hertfordshire, Hatfield, United Kingdom.

出版信息

Noro Psikiyatr Ars. 2020 Jan 30;57(2):98-107. doi: 10.29399/npa.24772. eCollection 2020 Jun.

Abstract

INTRODUCTION

Previous studies showed that over the course of learning many neurons in the medial prefrontal cortex adapt their firing rate towards the options with highest predicted value reward but it was showed that during later learning trials the brain switches to a more automatic processing mode governed by the basal ganglia. Based on this evidence, we hypothesized that during the early learning trials the predicted values of chosen options will be coded by a goal directed system in the medial frontal cortex but during the late trials the predicted values will be coded by the habitual learning system in the dorsal striatum.

METHODS

In this study, using a 3 Tesla functional magnetic resonance imaging scanner (fMRI), blood oxygen level dependent signal (BOLD) data was collected whilst participants (N=12) performed a reinforcement learning task. The task consisted of instrumental conditioning trials wherein each trial a participant choose one of the two available options in order to win or avoid losing money. In addition to that, depending on the experimental condition, participants received either monetary reward (gain money), monetary penalty (lose money) or neural outcome.

RESULTS

Using model-based analysis for functional magnetic resonance imaging (fMRI) event related designs; region of interest (ROI) analysis was performed to nucleus accumbens, medial frontal cortex, caudate nucleus, putamen and globus pallidus internal and external segments. In order to compare the difference in brain activity for early (goal directed) versus late learning (habitual, automatic) trials, separate ROI analyses were performed for each anatomical sub-region. For the reward condition, we found significant activity in the medial frontal cortex (p<0.05) only for early learning trials but activity is shifted to bilateral putamen (p<0.05) during later trials. However, for the loss condition no significant activity was found for early trials except globus pallidus internal segment showed a significant activity (p<0.05) for later trials.

CONCLUSION

We found that during reinforcement learning activation in the brain shifted from the medial frontal regions to dorsal regions of the striatum. These findings suggest that there are two separable (early goal directed and late habitual) learning systems in the brain.

摘要

引言

先前的研究表明,在学习过程中,内侧前额叶皮质中的许多神经元会将其放电率调整至具有最高预测价值奖励的选项,但研究表明,在后期学习试验中,大脑会切换到由基底神经节控制的更自动的处理模式。基于这一证据,我们假设在早期学习试验中,所选选项的预测值将由内侧额叶皮质中的目标导向系统编码,但在后期试验中,预测值将由背侧纹状体中的习惯学习系统编码。

方法

在本研究中,使用3特斯拉功能磁共振成像扫描仪(fMRI),在参与者(N = 12)执行强化学习任务时收集血氧水平依赖信号(BOLD)数据。该任务包括工具性条件反射试验,其中每次试验参与者选择两个可用选项之一以赢钱或避免输钱。除此之外,根据实验条件,参与者会获得金钱奖励(赢钱)、金钱惩罚(输钱)或神经结果。

结果

使用基于模型的功能磁共振成像(fMRI)事件相关设计分析;对伏隔核、内侧额叶皮质、尾状核、壳核以及苍白球内部和外部节段进行了感兴趣区域(ROI)分析。为了比较早期(目标导向)与后期学习(习惯、自动)试验中大脑活动的差异,对每个解剖子区域进行了单独的ROI分析。对于奖励条件,我们发现仅在早期学习试验中内侧额叶皮质有显著活动(p < 0.05),但在后期试验中活动转移至双侧壳核(p < 0.05)。然而,对于损失条件,早期试验未发现显著活动,除了苍白球内部节段在后期试验中有显著活动(p < 0.05)。

结论

我们发现,在强化学习过程中,大脑激活从内侧额叶区域转移至纹状体的背侧区域。这些发现表明,大脑中存在两种可分离的(早期目标导向和后期习惯)学习系统。

相似文献

本文引用的文献

7
Neural computations associated with goal-directed choice.与目标导向选择相关的神经计算。
Curr Opin Neurobiol. 2010 Apr;20(2):262-70. doi: 10.1016/j.conb.2010.03.001. Epub 2010 Mar 24.
10
Dialogues on prediction errors.关于预测误差的对话。
Trends Cogn Sci. 2008 Jul;12(7):265-72. doi: 10.1016/j.tics.2008.03.006. Epub 2008 Jun 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验