Erdeniz Burak, Done John
Department of Psychology, İzmir University of Economics, İzmir, Turkey.
Department of Psychology and Sports Sciences, School of Life and Medical Sciences, University of Hertfordshire, Hatfield, United Kingdom.
Noro Psikiyatr Ars. 2020 Jan 30;57(2):98-107. doi: 10.29399/npa.24772. eCollection 2020 Jun.
Previous studies showed that over the course of learning many neurons in the medial prefrontal cortex adapt their firing rate towards the options with highest predicted value reward but it was showed that during later learning trials the brain switches to a more automatic processing mode governed by the basal ganglia. Based on this evidence, we hypothesized that during the early learning trials the predicted values of chosen options will be coded by a goal directed system in the medial frontal cortex but during the late trials the predicted values will be coded by the habitual learning system in the dorsal striatum.
In this study, using a 3 Tesla functional magnetic resonance imaging scanner (fMRI), blood oxygen level dependent signal (BOLD) data was collected whilst participants (N=12) performed a reinforcement learning task. The task consisted of instrumental conditioning trials wherein each trial a participant choose one of the two available options in order to win or avoid losing money. In addition to that, depending on the experimental condition, participants received either monetary reward (gain money), monetary penalty (lose money) or neural outcome.
Using model-based analysis for functional magnetic resonance imaging (fMRI) event related designs; region of interest (ROI) analysis was performed to nucleus accumbens, medial frontal cortex, caudate nucleus, putamen and globus pallidus internal and external segments. In order to compare the difference in brain activity for early (goal directed) versus late learning (habitual, automatic) trials, separate ROI analyses were performed for each anatomical sub-region. For the reward condition, we found significant activity in the medial frontal cortex (p<0.05) only for early learning trials but activity is shifted to bilateral putamen (p<0.05) during later trials. However, for the loss condition no significant activity was found for early trials except globus pallidus internal segment showed a significant activity (p<0.05) for later trials.
We found that during reinforcement learning activation in the brain shifted from the medial frontal regions to dorsal regions of the striatum. These findings suggest that there are two separable (early goal directed and late habitual) learning systems in the brain.
先前的研究表明,在学习过程中,内侧前额叶皮质中的许多神经元会将其放电率调整至具有最高预测价值奖励的选项,但研究表明,在后期学习试验中,大脑会切换到由基底神经节控制的更自动的处理模式。基于这一证据,我们假设在早期学习试验中,所选选项的预测值将由内侧额叶皮质中的目标导向系统编码,但在后期试验中,预测值将由背侧纹状体中的习惯学习系统编码。
在本研究中,使用3特斯拉功能磁共振成像扫描仪(fMRI),在参与者(N = 12)执行强化学习任务时收集血氧水平依赖信号(BOLD)数据。该任务包括工具性条件反射试验,其中每次试验参与者选择两个可用选项之一以赢钱或避免输钱。除此之外,根据实验条件,参与者会获得金钱奖励(赢钱)、金钱惩罚(输钱)或神经结果。
使用基于模型的功能磁共振成像(fMRI)事件相关设计分析;对伏隔核、内侧额叶皮质、尾状核、壳核以及苍白球内部和外部节段进行了感兴趣区域(ROI)分析。为了比较早期(目标导向)与后期学习(习惯、自动)试验中大脑活动的差异,对每个解剖子区域进行了单独的ROI分析。对于奖励条件,我们发现仅在早期学习试验中内侧额叶皮质有显著活动(p < 0.05),但在后期试验中活动转移至双侧壳核(p < 0.05)。然而,对于损失条件,早期试验未发现显著活动,除了苍白球内部节段在后期试验中有显著活动(p < 0.05)。
我们发现,在强化学习过程中,大脑激活从内侧额叶区域转移至纹状体的背侧区域。这些发现表明,大脑中存在两种可分离的(早期目标导向和后期习惯)学习系统。