Suppr超能文献

奖励预测误差,而不是感觉预测误差,在人类强化学习中的模型选择中起着主要作用。

Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning.

机构信息

School of Integrative and Global Majors, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.

Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.

出版信息

Neural Netw. 2022 Oct;154:109-121. doi: 10.1016/j.neunet.2022.07.002. Epub 2022 Jul 13.

Abstract

Model-based reinforcement learning enables an agent to learn in variable environments and tasks by optimizing its actions based on the predicted states and outcomes. This mechanism has also been considered in the brain. However, exactly how the brain selects an appropriate model for confronting environments has remained unclear. Here, we investigated the model selection algorithm in the human brain during a reinforcement learning task. One primary theory of model selection in the brain is based on sensory prediction errors. Here, we compared this theory with an alternative possibility of internal model selection with reward prediction errors. To compare these two theories, we devised a switching experiment from a first-order Markov decision process to a second-order Markov decision process that provides either reward- or sensory prediction error regarding environmental change. We tested two representative computational models driven by different prediction errors. One is the sensory prediction-error-driven Bayesian algorithm, which has been discussed as a representative internal model selection algorithm in the animal reinforcement learning task. The other is the reward-prediction-error-driven policy gradient algorithm. We compared the simulation results of these two computational models with human reinforcement learning behaviors. The model fitting result supports that the policy gradient algorithm is preferable to the Bayesian algorithm. This suggests that the human brain employs the reward prediction error to select an appropriate internal model in the reinforcement learning task.

摘要

基于模型的强化学习使代理能够通过根据预测状态和结果优化其动作来学习变化的环境和任务。大脑中也考虑到了这种机制。然而,大脑究竟如何选择合适的模型来应对环境仍然不清楚。在这里,我们研究了人类大脑在强化学习任务中的模型选择算法。大脑中模型选择的一个主要理论是基于感觉预测误差。在这里,我们将这一理论与基于奖励预测误差的内部模型选择的另一种可能性进行了比较。为了比较这两种理论,我们设计了一个从一阶马尔可夫决策过程到二阶马尔可夫决策过程的切换实验,该实验提供了关于环境变化的奖励或感觉预测误差。我们测试了两种由不同预测误差驱动的代表性计算模型。一种是基于感觉预测误差的贝叶斯算法,它被认为是动物强化学习任务中内部模型选择算法的代表。另一种是奖励预测误差驱动的策略梯度算法。我们将这两种计算模型的模拟结果与人类强化学习行为进行了比较。模型拟合结果支持策略梯度算法优于贝叶斯算法。这表明人类大脑在强化学习任务中使用奖励预测误差来选择合适的内部模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验