• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

惊奇行动者-评论家模型的脑信号:人类决策中多个学习模块的证据。

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

作者信息

Liakoni Vasiliki, Lehmann Marco P, Modirshanechi Alireza, Brea Johanni, Lutti Antoine, Gerstner Wulfram, Preuschoff Kerstin

机构信息

École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland.

École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland.

出版信息

Neuroimage. 2022 Feb 1;246:118780. doi: 10.1016/j.neuroimage.2021.118780. Epub 2021 Dec 5.

DOI:10.1016/j.neuroimage.2021.118780
PMID:34875383
Abstract

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.

摘要

学会如何通过一系列长期行动获得奖励是人类一项非凡的能力,可能由多个并行学习模块引导。当前对学习模块的脑成像受到以下限制:(i)实验范式简单;(ii)不同学习模块的脑信号相互纠缠;(iii)作为行为解释候选的计算模型数量有限。在此,我们解决这三个限制,(i)引入一个带有意外事件的复杂序列决策任务,这使我们能够(ii)在功能磁共振成像(fMRI)中将奖励预测误差的相关因素与意外的相关因素区分开来;(iii)我们针对大量无模型、基于模型和混合强化学习算法测试行为,包括一种新颖的意外调制演员-评论家算法。在我们的算法中,意外源于用于学习世界模型的近似贝叶斯方法,从状态预测误差中提取。然后,意外用于调制无模型演员的学习率,该演员本身通过评论家从无模型价值估计中得到的奖励预测误差进行学习。我们发现行动选择可以通过纯无模型策略梯度得到很好的解释,但反应时间和神经数据则不然。我们在血氧水平依赖(BOLD)反应中识别出无模型和基于意外的学习信号的特征,支持大脑中存在多个并行学习模块。我们的结果将先前的fMRI研究结果扩展到多步骤情境,并强调了策略梯度和意外信号在人类学习中的作用。

相似文献

1
Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.惊奇行动者-评论家模型的脑信号:人类决策中多个学习模块的证据。
Neuroimage. 2022 Feb 1;246:118780. doi: 10.1016/j.neuroimage.2021.118780. Epub 2021 Dec 5.
2
Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.人类大脑中脑纹状体回路中不同的预测误差介导了对状态和动作价值的学习:来自高分辨率功能磁共振成像的证据。
PLoS Comput Biol. 2017 Oct 19;13(10):e1005810. doi: 10.1371/journal.pcbi.1005810. eCollection 2017 Oct.
3
Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.新颖性不是惊喜:人类在序列决策中的探索和适应行为。
PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun.
4
Policy adjustment in a dynamic economic game.动态经济博弈中的政策调整。
PLoS One. 2006 Dec 20;1(1):e103. doi: 10.1371/journal.pone.0000103.
5
Signals in human striatum are appropriate for policy update rather than value prediction.人类纹状体中的信号适合用于策略更新,而不是价值预测。
J Neurosci. 2011 Apr 6;31(14):5504-11. doi: 10.1523/JNEUROSCI.6316-10.2011.
6
Neural correlates of forward planning in a spatial decision task in humans.人类在空间决策任务中进行前瞻性规划的神经关联。
J Neurosci. 2011 Apr 6;31(14):5526-39. doi: 10.1523/JNEUROSCI.4647-10.2011.
7
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策:强化学习预测错误在人类中的快速传播。
J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.
8
Reward and fictive prediction error signals in ventral striatum: asymmetry between factual and counterfactual processing.腹侧纹状体中的奖励和虚构预测误差信号:事实和反事实处理之间的不对称性。
Brain Struct Funct. 2021 Jun;226(5):1553-1569. doi: 10.1007/s00429-021-02270-3. Epub 2021 Apr 11.
9
Surprise beyond prediction error.超出预测误差的惊喜。
Hum Brain Mapp. 2014 Sep;35(9):4805-14. doi: 10.1002/hbm.22513. Epub 2014 Apr 3.
10
Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.多维环境中强化学习与注意力之间的动态交互
Neuron. 2017 Jan 18;93(2):451-463. doi: 10.1016/j.neuron.2016.12.040.

引用本文的文献

1
Higher-order and distributed synergistic functional interactions encode information gain in goal-directed learning.高阶和分布式协同功能相互作用在目标导向学习中编码信息增益。
Nat Commun. 2025 Aug 5;16(1):7179. doi: 10.1038/s41467-025-62507-1.
2
Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans.人类中平行的易感性和不易感性的条件价值学习的神经基质。
Nat Commun. 2023 Dec 5;14(1):8057. doi: 10.1038/s41467-023-43747-5.