• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

内侧前额叶皮质与强化学习参数的适应性调节。

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

机构信息

INSERM U846, Stem Cell and Brain Research Institute, Bron, France.

出版信息

Prog Brain Res. 2013;202:441-64. doi: 10.1016/B978-0-444-62604-2.00022-8.

DOI:10.1016/B978-0-444-62604-2.00022-8
PMID:23317844
Abstract

Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal cortical subregions may cooperate to regulate RL parameters. It also shows how such neurophysiologically inspired mechanisms can control advanced robots in the real world. Finally, the model's learning mechanisms that were challenged in the last robotic scenario provide testable predictions on the way monkeys may learn the structure of the task during the pretraining phase of the previous laboratory experiments.

摘要

越来越多的证据表明,内侧前额叶皮层(MPFC)参与反馈分类、绩效监测和任务监测,并可能有助于在线调节强化学习(RL)参数,从而影响外侧前额叶皮层(LPFC)中的决策过程。先前的神经生理学实验表明,MPFC 活动编码错误的可能性、不确定性、奖励波动性,以及对不同类型反馈进行分类的神经反应,例如,区分选择错误和执行错误。Rushworth 及其同事提出,MPFC 参与跟踪任务的波动性可能有助于调节 RL 参数之一,即学习率。我们通过提出以下假设来扩展这一假设,即 MPFC 可能有助于调节其他 RL 参数,例如在任务转换时的探索率和默认动作值。在这里,我们分析了两种猴子决策任务中行为表现对 RL 参数的敏感性,一种是确定性奖励计划,另一种是随机奖励计划。我们表明,这些任务中的每一个都存在特定的最佳参数值,需要找到这些值才能实现最佳性能,而在计算模型中通常需要手动调整这些值。相比之下,使用某些启发式方法自动在线调节这些参数可以帮助在每个任务中产生良好的、尽管不是最优的行为表现。我们最后描述了我们的 MPFC-LPFC 交互的计算模型,用于在线调节探索率及其在人机交互场景中的应用。在这种情况下,由人类引入提示任务变化或作弊会产生意外的不确定性。该模型使机器人能够自主学习根据这些不确定的提示和事件重置探索。综合结果提供了具体证据,说明前额叶皮质的不同子区域如何合作调节 RL 参数。它还展示了这种受神经生理学启发的机制如何在现实世界中控制高级机器人。最后,模型在最后一个机器人场景中面临的学习机制为猴子在之前实验室实验的预训练阶段如何学习任务结构提供了可测试的预测。

相似文献

1
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.内侧前额叶皮质与强化学习参数的适应性调节。
Prog Brain Res. 2013;202:441-64. doi: 10.1016/B978-0-444-62604-2.00022-8.
2
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
3
Behavioral Regulation and the Modulation of Information Coding in the Lateral Prefrontal and Cingulate Cortex.行为调节与外侧前额叶和扣带回皮质中信息编码的调制
Cereb Cortex. 2015 Sep;25(9):3197-218. doi: 10.1093/cercor/bhu114. Epub 2014 Jun 5.
4
Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.灵长类动物背外侧前额叶皮层中的强化学习与决策机制。
Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8.
5
Multiple representations of belief states and action values in corticobasal ganglia loops.皮质基底神经节环路中信念状态和动作值的多种表征
Ann N Y Acad Sci. 2007 May;1104:213-28. doi: 10.1196/annals.1390.024. Epub 2007 Apr 13.
6
From conflict management to reward-based decision making: actors and critics in primate medial frontal cortex.从冲突管理到基于奖励的决策:灵长类动物内侧前额叶皮层的作用者和批评者。
Neurosci Biobehav Rev. 2014 Oct;46 Pt 1:44-57. doi: 10.1016/j.neubiorev.2013.11.003. Epub 2013 Nov 15.
7
Robot cognitive control with a neurophysiologically inspired reinforcement learning model.基于神经生理学启发的强化学习模型的机器人认知控制。
Front Neurorobot. 2011 Jul 12;5:1. doi: 10.3389/fnbot.2011.00001. eCollection 2011.
8
[Neural mechanisms of decision making].[决策的神经机制]
Brain Nerve. 2008 Sep;60(9):1017-27.
9
Banishing the homunculus: making working memory work.摒弃小人:让工作记忆发挥作用。
Neuroscience. 2006 Apr 28;139(1):105-18. doi: 10.1016/j.neuroscience.2005.04.067. Epub 2005 Dec 15.
10
Predicting Motivation: Computational Models of PFC Can Explain Neural Coding of Motivation and Effort-based Decision-making in Health and Disease.预测动机:PFC 的计算模型可以解释健康和疾病中动机的神经编码和基于努力的决策。
J Cogn Neurosci. 2017 Oct;29(10):1633-1645. doi: 10.1162/jocn_a_01160. Epub 2017 Jun 27.

引用本文的文献

1
Dynamic prefrontal coupling coordinates adaptive decision-making.动态前额叶耦合协调适应性决策。
Res Sq. 2025 Apr 9:rs.3.rs-6296852. doi: 10.21203/rs.3.rs-6296852/v1.
2
Meta-Reinforcement Learning reconciles surprise, value, and control in the anterior cingulate cortex.元强化学习在前扣带回皮层中协调意外、价值和控制。
PLoS Comput Biol. 2025 Apr 22;21(4):e1013025. doi: 10.1371/journal.pcbi.1013025. eCollection 2025 Apr.
3
Anterior Cingulate Cortex Causally Supports Meta-Learning.前扣带回皮层因果性地支持元学习。
bioRxiv. 2024 Jun 13:2024.06.12.598723. doi: 10.1101/2024.06.12.598723.
4
A neurocognitive model of early onset persistent and desistant antisocial behavior in early adulthood.成年早期早发性持续性和缓解性反社会行为的神经认知模型
Front Hum Neurosci. 2023 Jul 18;17:1100277. doi: 10.3389/fnhum.2023.1100277. eCollection 2023.
5
Debates on the dorsomedial prefrontal/dorsal anterior cingulate cortex: insights for future research.关于背内侧前额叶/背侧前扣带回的争论:对未来研究的启示。
Brain. 2023 Dec 1;146(12):4826-4844. doi: 10.1093/brain/awad263.
6
Learning at Variable Attentional Load Requires Cooperation of Working Memory, Meta-learning, and Attention-augmented Reinforcement Learning.在可变注意力负荷下学习需要工作记忆、元学习和注意力增强的强化学习的合作。
J Cogn Neurosci. 2021 Dec 6;34(1):79-107. doi: 10.1162/jocn_a_01780.
7
Temporal chunking as a mechanism for unsupervised learning of task-sets.作为一种无监督学习任务集的机制的时间切分。
Elife. 2020 Mar 9;9:e50469. doi: 10.7554/eLife.50469.
8
Impacts of inter-trial interval duration on a computational model of sign-tracking vs. goal-tracking behaviour.试验间间隔时长对标志追踪与目标追踪行为的计算模型的影响。
Psychopharmacology (Berl). 2019 Aug;236(8):2373-2388. doi: 10.1007/s00213-019-05323-y. Epub 2019 Jul 31.
9
Dopamine blockade impairs the exploration-exploitation trade-off in rats.多巴胺阻断会损害大鼠的探索-利用权衡。
Sci Rep. 2019 May 1;9(1):6770. doi: 10.1038/s41598-019-43245-z.
10
Solving the Credit Assignment Problem With the Prefrontal Cortex.利用前额叶皮层解决信用分配问题。
Front Neurosci. 2018 Mar 27;12:182. doi: 10.3389/fnins.2018.00182. eCollection 2018.