Suppr超能文献

内侧前额叶皮质与强化学习参数的适应性调节。

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

机构信息

INSERM U846, Stem Cell and Brain Research Institute, Bron, France.

出版信息

Prog Brain Res. 2013;202:441-64. doi: 10.1016/B978-0-444-62604-2.00022-8.

Abstract

Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal cortical subregions may cooperate to regulate RL parameters. It also shows how such neurophysiologically inspired mechanisms can control advanced robots in the real world. Finally, the model's learning mechanisms that were challenged in the last robotic scenario provide testable predictions on the way monkeys may learn the structure of the task during the pretraining phase of the previous laboratory experiments.

摘要

越来越多的证据表明,内侧前额叶皮层(MPFC)参与反馈分类、绩效监测和任务监测,并可能有助于在线调节强化学习(RL)参数,从而影响外侧前额叶皮层(LPFC)中的决策过程。先前的神经生理学实验表明,MPFC 活动编码错误的可能性、不确定性、奖励波动性,以及对不同类型反馈进行分类的神经反应,例如,区分选择错误和执行错误。Rushworth 及其同事提出,MPFC 参与跟踪任务的波动性可能有助于调节 RL 参数之一,即学习率。我们通过提出以下假设来扩展这一假设,即 MPFC 可能有助于调节其他 RL 参数,例如在任务转换时的探索率和默认动作值。在这里,我们分析了两种猴子决策任务中行为表现对 RL 参数的敏感性,一种是确定性奖励计划,另一种是随机奖励计划。我们表明,这些任务中的每一个都存在特定的最佳参数值,需要找到这些值才能实现最佳性能,而在计算模型中通常需要手动调整这些值。相比之下,使用某些启发式方法自动在线调节这些参数可以帮助在每个任务中产生良好的、尽管不是最优的行为表现。我们最后描述了我们的 MPFC-LPFC 交互的计算模型,用于在线调节探索率及其在人机交互场景中的应用。在这种情况下,由人类引入提示任务变化或作弊会产生意外的不确定性。该模型使机器人能够自主学习根据这些不确定的提示和事件重置探索。综合结果提供了具体证据,说明前额叶皮质的不同子区域如何合作调节 RL 参数。它还展示了这种受神经生理学启发的机制如何在现实世界中控制高级机器人。最后,模型在最后一个机器人场景中面临的学习机制为猴子在之前实验室实验的预训练阶段如何学习任务结构提供了可测试的预测。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验