• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可预测和不可预测环境动态下奖励预测的脑机制。

Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics.

作者信息

Tanaka Saori C, Samejima Kazuyuki, Okada Go, Ueda Kazutaka, Okamoto Yasumasa, Yamawaki Shigeto, Doya Kenji

机构信息

Department of Bioinformatics and Genomics, Nara Institute of Science and Technology, Japan.

出版信息

Neural Netw. 2006 Oct;19(8):1233-41. doi: 10.1016/j.neunet.2006.05.039. Epub 2006 Sep 18.

DOI:10.1016/j.neunet.2006.05.039
PMID:16979871
Abstract

In learning goal-directed behaviors, an agent has to consider not only the reward given at each state but also the consequences of dynamic state transitions associated with action selection. To understand brain mechanisms for action learning under predictable and unpredictable environmental dynamics, we measured brain activities by functional magnetic resonance imaging (fMRI) during a Markov decision task with predictable and unpredictable state transitions. Whereas the striatum and orbitofrontal cortex (OFC) were significantly activated both under predictable and unpredictable state transition rules, the dorsolateral prefrontal cortex (DLPFC) was more strongly activated under predictable than under unpredictable state transition rules. We then modelled subjects' choice behaviours using a reinforcement learning model and a Bayesian estimation framework and found that the subjects took larger temporal discount factors under predictable state transition rules. Model-based analysis of fMRI data revealed different engagement of striatum in reward prediction under different state transition dynamics. The ventral striatum was involved in reward prediction under both unpredictable and predictable state transition rules, although the dorsal striatum was dominantly involved in reward prediction under predictable rules. These results suggest different learning systems in the cortico-striatum loops depending on the dynamics of the environment: the OFC-ventral striatum loop is involved in action learning based on the present state, while the DLPFC-dorsal striatum loop is involved in action learning based on predictable future states.

摘要

在学习目标导向行为时,智能体不仅要考虑每个状态下给出的奖励,还要考虑与动作选择相关的动态状态转换的后果。为了理解在可预测和不可预测的环境动态下动作学习的大脑机制,我们在一个具有可预测和不可预测状态转换的马尔可夫决策任务中,通过功能磁共振成像(fMRI)测量大脑活动。虽然在可预测和不可预测的状态转换规则下,纹状体和眶额皮质(OFC)均被显著激活,但在可预测的状态转换规则下,背外侧前额叶皮质(DLPFC)的激活程度比不可预测的状态转换规则下更强。然后,我们使用强化学习模型和贝叶斯估计框架对受试者的选择行为进行建模,发现受试者在可预测的状态转换规则下采用了更大的时间折扣因子。基于模型的fMRI数据分析揭示了在不同状态转换动态下纹状体在奖励预测中的不同参与情况。腹侧纹状体在不可预测和可预测的状态转换规则下均参与奖励预测,尽管背侧纹状体在可预测规则下主要参与奖励预测。这些结果表明,根据环境动态,皮质-纹状体回路中存在不同的学习系统:OFC-腹侧纹状体回路参与基于当前状态的动作学习,而DLPFC-背侧纹状体回路参与基于可预测未来状态的动作学习。

相似文献

1
Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics.可预测和不可预测环境动态下奖励预测的脑机制。
Neural Netw. 2006 Oct;19(8):1233-41. doi: 10.1016/j.neunet.2006.05.039. Epub 2006 Sep 18.
2
Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.用于整合多个皮质-纹状体环路的异层级强化学习模型:刺激-动作-奖励关联学习中的功能磁共振成像检查
Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.
3
Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain.人类大脑在使用果汁和金钱奖励进行工具性学习过程中,背侧纹状体的预测误差存在重叠。
J Neurophysiol. 2009 Dec;102(6):3384-91. doi: 10.1152/jn.91195.2008. Epub 2009 Sep 30.
4
The neural coding of expected and unexpected monetary performance outcomes: dissociations between active and observational learning.预期和意外货币绩效结果的神经编码:主动学习和观察学习之间的分离。
Behav Brain Res. 2012 Feb 1;227(1):241-51. doi: 10.1016/j.bbr.2011.10.042. Epub 2011 Nov 6.
5
The neural correlates of reward-related trial-and-error learning: an fMRI study with a probabilistic learning task.奖励相关的试错学习的神经关联:一项使用概率学习任务的功能磁共振成像研究。
Learn Mem. 2008 Oct 2;15(10):728-32. doi: 10.1101/lm.1106408. Print 2008 Oct.
6
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
7
An fMRI study of reward-related probability learning.一项关于奖励相关概率学习的功能磁共振成像研究。
Neuroimage. 2005 Feb 1;24(3):862-73. doi: 10.1016/j.neuroimage.2004.10.002. Epub 2004 Nov 18.
8
Reduced orbitofrontal-striatal activity on a reversal learning task in obsessive-compulsive disorder.强迫症患者在反转学习任务中眶额-纹状体活动降低。
Arch Gen Psychiatry. 2006 Nov;63(11):1225-36. doi: 10.1001/archpsyc.63.11.1225.
9
Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning.在刺激-动作-奖励关联学习过程中,壳核和尾状核中奖励期望与奖励期望误差的不同神经关联。
J Neurophysiol. 2006 Feb;95(2):948-59. doi: 10.1152/jn.00382.2005. Epub 2005 Sep 28.
10
Dorsal striatal-midbrain connectivity in humans predicts how reinforcements are used to guide decisions.人类背侧纹状体与中脑的连接性可预测强化如何用于指导决策。
J Cogn Neurosci. 2009 Jul;21(7):1332-45. doi: 10.1162/jocn.2009.21092.

引用本文的文献

1
Discounting Future Reward in an Uncertain World.在一个不确定的世界中对未来奖励进行贴现。
Decision (Wash D C ). 2024 Apr;11(2):255-282. doi: 10.1037/dec0000219. Epub 2023 Jun 29.
2
Functional differentiation of the dorsal striatum: a coordinate-based neuroimaging meta-analysis.背侧纹状体的功能分化:基于坐标的神经影像学荟萃分析。
Quant Imaging Med Surg. 2023 Jan 1;13(1):471-488. doi: 10.21037/qims-22-133. Epub 2022 Sep 14.
3
Learning under social versus nonsocial uncertainty: A meta-analytic approach.在社会不确定性与非社会不确定性下的学习:一项元分析方法。
Hum Brain Mapp. 2022 Sep;43(13):4185-4206. doi: 10.1002/hbm.25948. Epub 2022 May 27.
4
The Protective Effects of Perceived Control During Repeated Exposure to Aversive Stimuli.重复暴露于厌恶刺激期间感知控制的保护作用。
Front Neurosci. 2021 Feb 3;15:625816. doi: 10.3389/fnins.2021.625816. eCollection 2021.
5
Prefrontal Asymmetry BCI Neurofeedback Datasets.前额叶不对称脑机接口神经反馈数据集。
Front Neurosci. 2020 Dec 18;14:601402. doi: 10.3389/fnins.2020.601402. eCollection 2020.
6
Morality and management: an oxymoron? fNIRS and neuromanagement perspective explain us why things are not like this.道德与管理:矛盾的统一体?功能近红外光谱技术和神经管理学视角解释了其中的缘由。
Cogn Affect Behav Neurosci. 2020 Dec;20(6):1336-1348. doi: 10.3758/s13415-020-00841-1. Epub 2020 Oct 29.
7
Moving towards specificity: A systematic review of cue features associated with reward and punishment in anorexia nervosa.向特异性迈进:厌食症中与奖励和惩罚相关的线索特征的系统综述。
Clin Psychol Rev. 2020 Jul;79:101872. doi: 10.1016/j.cpr.2020.101872. Epub 2020 May 27.
8
Bipolar oscillations between positive and negative mood states in a computational model of Basal Ganglia.基底神经节计算模型中正负情绪状态之间的双相振荡。
Cogn Neurodyn. 2020 Apr;14(2):181-202. doi: 10.1007/s11571-019-09564-7. Epub 2019 Nov 20.
9
Brain activations associated with scientific reasoning: a literature review.与科学推理相关的大脑激活:文献综述。
Cogn Process. 2019 May;20(2):139-161. doi: 10.1007/s10339-018-0896-z. Epub 2018 Dec 8.
10
A Motivational Model of BCI-Controlled Heuristic Search.一种脑机接口控制的启发式搜索激励模型。
Brain Sci. 2018 Aug 31;8(9):166. doi: 10.3390/brainsci8090166.