• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Towards Automaticity in Reinforcement Learning: A Model-Based Functional Magnetic Resonance Imaging Study.迈向强化学习中的自动化:一项基于模型的功能磁共振成像研究。
Noro Psikiyatr Ars. 2020 Jan 30;57(2):98-107. doi: 10.29399/npa.24772. eCollection 2020 Jun.
2
Altered monetary loss processing and reinforcement-based learning in individuals with obesity.肥胖个体的货币损失加工和基于强化的学习改变。
Brain Imaging Behav. 2018 Oct;12(5):1431-1449. doi: 10.1007/s11682-017-9786-8.
3
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
4
Many hats: intratrial and reward level-dependent BOLD activity in the striatum and premotor cortex.头戴多顶帽子:纹状体和运动前皮质的颅内和奖励水平依赖性 BOLD 活动。
J Neurophysiol. 2013 Oct;110(7):1689-702. doi: 10.1152/jn.00164.2012. Epub 2013 Jun 5.
5
Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.人类大脑在使用果汁和金钱奖励进行工具性学习过程中,背侧纹状体的预测误差存在重叠。
J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30.
6
Common and Distinct Functional Brain Networks for Intuitive and Deliberate Decision Making.用于直觉和审慎决策的常见和独特功能性脑网络。
Brain Sci. 2019 Jul 20;9(7):174. doi: 10.3390/brainsci9070174.
7
Reward-related reversal learning after surgical excisions in orbito-frontal or dorsolateral prefrontal cortex in humans.人类眶额皮质或背外侧前额叶皮质手术切除后的奖赏相关反转学习。
J Cogn Neurosci. 2004 Apr;16(3):463-78. doi: 10.1162/089892904322926791.
8
Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning.在刺激-动作-奖励关联学习过程中,壳核和尾状核中奖励期望与奖励期望误差的不同神经关联。
J Neurophysiol. 2006 Feb;95(2):948-59. doi: 10.1152/jn.00382.2005. Epub 2005 Sep 28.
9
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.未服药的重性抑郁障碍患者强化学习的神经机制。
Brain. 2017 Apr 1;140(4):1147-1157. doi: 10.1093/brain/awx025.
10
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.用于整合多个皮质-纹状体环路的异层级强化学习模型:刺激-动作-奖励关联学习中的功能磁共振成像检查
Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

引用本文的文献

1
Goal-directed and habitual control: from circuits and functions to exercise-induced neuroplasticity targets for the treatment of Parkinson's disease.目标导向与习惯性控制:从神经回路与功能到运动诱导神经可塑性靶点用于帕金森病治疗
Front Neurol. 2023 Oct 10;14:1254447. doi: 10.3389/fneur.2023.1254447. eCollection 2023.

本文引用的文献

1
Common and Distinct Functional Brain Networks for Intuitive and Deliberate Decision Making.用于直觉和审慎决策的常见和独特功能性脑网络。
Brain Sci. 2019 Jul 20;9(7):174. doi: 10.3390/brainsci9070174.
2
A simple solution for model comparison in bold imaging: the special case of reward prediction error and reward outcomes.一种用于 Bold 成像中模型比较的简单方法:奖励预测误差和奖励结果的特例。
Front Neurosci. 2013 Jul 19;7:116. doi: 10.3389/fnins.2013.00116. eCollection 2013.
3
Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies.强化学习中的预测误差:神经影像学研究的荟萃分析。
Neurosci Biobehav Rev. 2013 Aug;37(7):1297-310. doi: 10.1016/j.neubiorev.2013.03.023. Epub 2013 Apr 6.
4
Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways.强化学习:通过不同的皮质纹状体通路计算价值的时间差异。
Trends Neurosci. 2012 Aug;35(8):457-67. doi: 10.1016/j.tins.2012.04.009. Epub 2012 May 30.
5
Contributions of the ventromedial prefrontal cortex to goal-directed action selection.腹内侧前额叶皮质对目标导向行为选择的贡献。
Ann N Y Acad Sci. 2011 Dec;1239:118-29. doi: 10.1111/j.1749-6632.2011.06290.x.
6
Dopamine in motivational control: rewarding, aversive, and alerting.多巴胺在动机控制中的作用:奖赏、厌恶和警觉。
Neuron. 2010 Dec 9;68(5):815-34. doi: 10.1016/j.neuron.2010.11.022.
7
Neural computations associated with goal-directed choice.与目标导向选择相关的神经计算。
Curr Opin Neurobiol. 2010 Apr;20(2):262-70. doi: 10.1016/j.conb.2010.03.001. Epub 2010 Mar 24.
8
Neuronal encoding of reward value and direction of actions in the primate putamen.灵长类动物壳核中动作奖励价值和方向的神经元编码。
J Neurophysiol. 2009 Dec;102(6):3530-43. doi: 10.1152/jn.00104.2009. Epub 2009 Oct 7.
9
Adaptive coding of action values in the human rostral cingulate zone.人类吻侧扣带区动作值的适应性编码
J Neurosci. 2009 Jun 10;29(23):7489-96. doi: 10.1523/JNEUROSCI.0349-09.2009.
10
Dialogues on prediction errors.关于预测误差的对话。
Trends Cogn Sci. 2008 Jul;12(7):265-72. doi: 10.1016/j.tics.2008.03.006. Epub 2008 Jun 21.

迈向强化学习中的自动化:一项基于模型的功能磁共振成像研究。

Towards Automaticity in Reinforcement Learning: A Model-Based Functional Magnetic Resonance Imaging Study.

作者信息

Erdeniz Burak, Done John

机构信息

Department of Psychology, İzmir University of Economics, İzmir, Turkey.

Department of Psychology and Sports Sciences, School of Life and Medical Sciences, University of Hertfordshire, Hatfield, United Kingdom.

出版信息

Noro Psikiyatr Ars. 2020 Jan 30;57(2):98-107. doi: 10.29399/npa.24772. eCollection 2020 Jun.

DOI:10.29399/npa.24772
PMID:32550774
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7285637/
Abstract

INTRODUCTION

Previous studies showed that over the course of learning many neurons in the medial prefrontal cortex adapt their firing rate towards the options with highest predicted value reward but it was showed that during later learning trials the brain switches to a more automatic processing mode governed by the basal ganglia. Based on this evidence, we hypothesized that during the early learning trials the predicted values of chosen options will be coded by a goal directed system in the medial frontal cortex but during the late trials the predicted values will be coded by the habitual learning system in the dorsal striatum.

METHODS

In this study, using a 3 Tesla functional magnetic resonance imaging scanner (fMRI), blood oxygen level dependent signal (BOLD) data was collected whilst participants (N=12) performed a reinforcement learning task. The task consisted of instrumental conditioning trials wherein each trial a participant choose one of the two available options in order to win or avoid losing money. In addition to that, depending on the experimental condition, participants received either monetary reward (gain money), monetary penalty (lose money) or neural outcome.

RESULTS

Using model-based analysis for functional magnetic resonance imaging (fMRI) event related designs; region of interest (ROI) analysis was performed to nucleus accumbens, medial frontal cortex, caudate nucleus, putamen and globus pallidus internal and external segments. In order to compare the difference in brain activity for early (goal directed) versus late learning (habitual, automatic) trials, separate ROI analyses were performed for each anatomical sub-region. For the reward condition, we found significant activity in the medial frontal cortex (p<0.05) only for early learning trials but activity is shifted to bilateral putamen (p<0.05) during later trials. However, for the loss condition no significant activity was found for early trials except globus pallidus internal segment showed a significant activity (p<0.05) for later trials.

CONCLUSION

We found that during reinforcement learning activation in the brain shifted from the medial frontal regions to dorsal regions of the striatum. These findings suggest that there are two separable (early goal directed and late habitual) learning systems in the brain.

摘要

引言

先前的研究表明,在学习过程中,内侧前额叶皮质中的许多神经元会将其放电率调整至具有最高预测价值奖励的选项,但研究表明,在后期学习试验中,大脑会切换到由基底神经节控制的更自动的处理模式。基于这一证据,我们假设在早期学习试验中,所选选项的预测值将由内侧额叶皮质中的目标导向系统编码,但在后期试验中,预测值将由背侧纹状体中的习惯学习系统编码。

方法

在本研究中,使用3特斯拉功能磁共振成像扫描仪(fMRI),在参与者(N = 12)执行强化学习任务时收集血氧水平依赖信号(BOLD)数据。该任务包括工具性条件反射试验,其中每次试验参与者选择两个可用选项之一以赢钱或避免输钱。除此之外,根据实验条件,参与者会获得金钱奖励(赢钱)、金钱惩罚(输钱)或神经结果。

结果

使用基于模型的功能磁共振成像(fMRI)事件相关设计分析;对伏隔核、内侧额叶皮质、尾状核、壳核以及苍白球内部和外部节段进行了感兴趣区域(ROI)分析。为了比较早期(目标导向)与后期学习(习惯、自动)试验中大脑活动的差异,对每个解剖子区域进行了单独的ROI分析。对于奖励条件,我们发现仅在早期学习试验中内侧额叶皮质有显著活动(p < 0.05),但在后期试验中活动转移至双侧壳核(p < 0.05)。然而,对于损失条件,早期试验未发现显著活动,除了苍白球内部节段在后期试验中有显著活动(p < 0.05)。

结论

我们发现,在强化学习过程中,大脑激活从内侧额叶区域转移至纹状体的背侧区域。这些发现表明,大脑中存在两种可分离的(早期目标导向和后期习惯)学习系统。