奖励预测误差，而不是感觉预测误差，在人类强化学习中的模型选择中起着主要作用。

Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning.

机构信息

School of Integrative and Global Majors, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.

Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan.

出版信息

Neural Netw. 2022 Oct;154:109-121. doi: 10.1016/j.neunet.2022.07.002. Epub 2022 Jul 13.

DOI:10.1016/j.neunet.2022.07.002

PMID:35872516

Abstract

Model-based reinforcement learning enables an agent to learn in variable environments and tasks by optimizing its actions based on the predicted states and outcomes. This mechanism has also been considered in the brain. However, exactly how the brain selects an appropriate model for confronting environments has remained unclear. Here, we investigated the model selection algorithm in the human brain during a reinforcement learning task. One primary theory of model selection in the brain is based on sensory prediction errors. Here, we compared this theory with an alternative possibility of internal model selection with reward prediction errors. To compare these two theories, we devised a switching experiment from a first-order Markov decision process to a second-order Markov decision process that provides either reward- or sensory prediction error regarding environmental change. We tested two representative computational models driven by different prediction errors. One is the sensory prediction-error-driven Bayesian algorithm, which has been discussed as a representative internal model selection algorithm in the animal reinforcement learning task. The other is the reward-prediction-error-driven policy gradient algorithm. We compared the simulation results of these two computational models with human reinforcement learning behaviors. The model fitting result supports that the policy gradient algorithm is preferable to the Bayesian algorithm. This suggests that the human brain employs the reward prediction error to select an appropriate internal model in the reinforcement learning task.

摘要

基于模型的强化学习使代理能够通过根据预测状态和结果优化其动作来学习变化的环境和任务。大脑中也考虑到了这种机制。然而，大脑究竟如何选择合适的模型来应对环境仍然不清楚。在这里，我们研究了人类大脑在强化学习任务中的模型选择算法。大脑中模型选择的一个主要理论是基于感觉预测误差。在这里，我们将这一理论与基于奖励预测误差的内部模型选择的另一种可能性进行了比较。为了比较这两种理论，我们设计了一个从一阶马尔可夫决策过程到二阶马尔可夫决策过程的切换实验，该实验提供了关于环境变化的奖励或感觉预测误差。我们测试了两种由不同预测误差驱动的代表性计算模型。一种是基于感觉预测误差的贝叶斯算法，它被认为是动物强化学习任务中内部模型选择算法的代表。另一种是奖励预测误差驱动的策略梯度算法。我们将这两种计算模型的模拟结果与人类强化学习行为进行了比较。模型拟合结果支持策略梯度算法优于贝叶斯算法。这表明人类大脑在强化学习任务中使用奖励预测误差来选择合适的内部模型。

相似文献

Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning.

Neural Netw. 2022 Oct;154:109-121. doi: 10.1016/j.neunet.2022.07.002. Epub 2022 Jul 13.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.

J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.

Effort Reinforces Learning.

J Neurosci. 2022 Oct 5;42(40):7648-7658. doi: 10.1523/JNEUROSCI.2223-21.2022. Epub 2022 Sep 12.

Reward maximization justifies the transition from sensory selection at childhood to sensory integration at adulthood.

PLoS One. 2014 Jul 24;9(7):e103143. doi: 10.1371/journal.pone.0103143. eCollection 2014.

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

Neuroimage. 2022 Feb 1;246:118780. doi: 10.1016/j.neuroimage.2021.118780. Epub 2021 Dec 5.

One-shot learning and behavioral eligibility traces in sequential decision making.

Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Individual differences and the neural representations of reward expectation and reward prediction error.

Soc Cogn Affect Neurosci. 2007 Mar;2(1):20-30. doi: 10.1093/scan/nsl021.

Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.

引用本文的文献

Reinforcement feedback impairs locomotor adaptation and retention.

Front Behav Neurosci. 2024 Apr 24;18:1388495. doi: 10.3389/fnbeh.2024.1388495. eCollection 2024.

Explicit learning based on reward prediction error facilitates agile motor adaptations.

PLoS One. 2023 Dec 6;18(12):e0295274. doi: 10.1371/journal.pone.0295274. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

奖励预测误差，而不是感觉预测误差，在人类强化学习中的模型选择中起着主要作用。

Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献