• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

惊喜在人类强化学习中作为结果价值的一种消减因素。

Surprise Acts as a Reducer of Outcome Value in Human Reinforcement Learning.

作者信息

Sumiya Motofumi, Katahira Kentaro

机构信息

Department of Cognitive and Psychological Sciences, Graduate School of Informatics, Nagoya University, Nagoya, Japan.

Japan Society for the Promotion of Science, Tokyo, Japan.

出版信息

Front Neurosci. 2020 Sep 8;14:852. doi: 10.3389/fnins.2020.00852. eCollection 2020.

DOI:10.3389/fnins.2020.00852
PMID:33013288
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7506125/
Abstract

Surprise occurs because of differences between a decision outcome and its predicted outcome (prediction error), regardless of whether the error is positive or negative. It has recently been postulated that surprise affects the reward value of the action outcome; studies have indicated that increasing surprise as an absolute value of prediction error decreases the value of the outcome. However, how surprise affects the value of the outcome and subsequent decision making is unclear. We suggest that, on the assumption that surprise decreases the outcome value, agents will increase their risk-averse choices when an outcome is often surprising. Here, we propose the surprise-sensitive utility model, a reinforcement learning model that states that surprise decreases the outcome value, to explain how surprise affects subsequent decision making. To investigate the properties of the proposed model, we compare the model with previous reinforcement learning models on two probabilistic learning tasks by simulations. As a result, the proposed model explains the risk-averse choices like the previous models, and the risk-averse choices increase as the surprise-based modulation parameter of outcome value increases. We also performed statistical model selection by using two experimental datasets with different tasks. The proposed model fits these datasets better than the other models with the same number of free parameters, indicating that the model can better capture the trial-by-trial dynamics of choice behavior.

摘要

意外的发生是由于决策结果与其预测结果之间存在差异(预测误差),无论该误差是正还是负。最近有研究假设,意外会影响行动结果的奖励价值;研究表明,将意外作为预测误差的绝对值增加时,结果的价值会降低。然而,意外如何影响结果的价值以及后续的决策尚不清楚。我们认为,假设意外会降低结果价值,那么当一个结果经常令人意外时,主体会增加其风险规避选择。在此,我们提出了意外敏感效用模型,这是一种强化学习模型,该模型指出意外会降低结果价值,以解释意外如何影响后续决策。为了研究该模型的特性,我们通过模拟在两个概率学习任务上,将该模型与之前的强化学习模型进行比较。结果表明,该模型与之前的模型一样能够解释风险规避选择,并且随着基于意外的结果价值调制参数增加,风险规避选择也会增加。我们还使用两个具有不同任务的实验数据集进行了统计模型选择。在具有相同数量自由参数的情况下,该模型比其他模型能更好地拟合这些数据集,这表明该模型能够更好地捕捉选择行为的逐次试验动态。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/ee5a76be5dbc/fnins-14-00852-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/6ab8446a8ebc/fnins-14-00852-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/e15894bdd308/fnins-14-00852-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/229e04b1587d/fnins-14-00852-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/4224815dc34f/fnins-14-00852-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/ee5a76be5dbc/fnins-14-00852-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/6ab8446a8ebc/fnins-14-00852-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/e15894bdd308/fnins-14-00852-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/229e04b1587d/fnins-14-00852-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/4224815dc34f/fnins-14-00852-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e8f/7506125/ee5a76be5dbc/fnins-14-00852-g005.jpg

相似文献

1
Surprise Acts as a Reducer of Outcome Value in Human Reinforcement Learning.惊喜在人类强化学习中作为结果价值的一种消减因素。
Front Neurosci. 2020 Sep 8;14:852. doi: 10.3389/fnins.2020.00852. eCollection 2020.
2
Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.惊奇行动者-评论家模型的脑信号:人类决策中多个学习模块的证据。
Neuroimage. 2022 Feb 1;246:118780. doi: 10.1016/j.neuroimage.2021.118780. Epub 2021 Dec 5.
3
Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks.任务可学习性调节概率选择任务中强化学习的惊奇感,但不调节效价处理。
J Cogn Neurosci. 2021 Dec 6;34(1):34-53. doi: 10.1162/jocn_a_01777.
4
Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.新颖性不是惊喜:人类在序列决策中的探索和适应行为。
PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun.
5
Time elapsed between choices in a probabilistic task correlates with repeating the same decision.在概率任务中,选择之间的时间流逝与重复相同的决策相关。
Eur J Neurosci. 2021 Apr;53(8):2639-2654. doi: 10.1111/ejn.15144. Epub 2021 Mar 2.
6
Modulation of value-based decision making behavior by subregions of the rat prefrontal cortex.大鼠前额皮质亚区对基于价值的决策行为的调节。
Psychopharmacology (Berl). 2020 May;237(5):1267-1280. doi: 10.1007/s00213-020-05454-7. Epub 2020 Feb 6.
7
Dual learning processes underlying human decision-making in reversal learning tasks: functional significance and evidence from the model fit to human behavior.人类在反转学习任务中决策的双重学习过程:功能意义及模型拟合人类行为的证据。
Front Psychol. 2014 Aug 12;5:871. doi: 10.3389/fpsyg.2014.00871. eCollection 2014.
8
Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习
Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.
9
Influence of surprise on reinforcement learning in younger and older adults.年轻人和老年人中惊喜对强化学习的影响。
PLoS Comput Biol. 2024 Aug 14;20(8):e1012331. doi: 10.1371/journal.pcbi.1012331. eCollection 2024 Aug.
10
Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.跨越诊断界限预测精神病:精神分裂症和有精神病病史的双相情感障碍中强化学习受损的行为和计算建模证据。
J Abnorm Psychol. 2015 Aug;124(3):697-708. doi: 10.1037/abn0000039.

引用本文的文献

1
The preference for surprise in reinforcement learning underlies the differences in developmental changes in risk preference between autistic and neurotypical youth.强化学习中对意外的偏好是自闭症青年和神经典型青年在风险偏好发展变化上存在差异的基础。
Mol Autism. 2025 Jan 16;16(1):3. doi: 10.1186/s13229-025-00637-5.

本文引用的文献

1
Precision weighting of cortical unsigned prediction error signals benefits learning, is mediated by dopamine, and is impaired in psychosis.精确加权皮质无符号预测误差信号有益于学习,由多巴胺介导,并且在精神疾病中受损。
Mol Psychiatry. 2021 Sep;26(9):5320-5333. doi: 10.1038/s41380-020-0803-8. Epub 2020 Jun 24.
2
Attenuated activation of the anterior rostral medial prefrontal cortex on self-relevant social reward processing in individuals with autism spectrum disorder.自闭症谱系障碍个体的自我相关社会奖励处理中,前额皮质内侧前脑区的激活减弱。
Neuroimage Clin. 2020;26:102249. doi: 10.1016/j.nicl.2020.102249. Epub 2020 Mar 19.
3
The Relation Between Preference for Predictability and Autistic Traits.
偏好可预测性与自闭症特质之间的关系。
Autism Res. 2020 Jul;13(7):1144-1154. doi: 10.1002/aur.2244. Epub 2019 Dec 4.
4
Active Inference and Cognitive Consistency.主动推理与认知一致性。
Psychol Inq. 2018 Oct 10;29(2):67-73. doi: 10.1080/1047840X.2018.1480693. eCollection 2018.
5
Evaluation of the Social Motivation Hypothesis of Autism: A Systematic Review and Meta-analysis.自闭症社会动机假说的评估:系统评价和荟萃分析。
JAMA Psychiatry. 2018 Aug 1;75(8):797-808. doi: 10.1001/jamapsychiatry.2018.1100.
6
Emotions surrounding friendships of adolescents with autism spectrum disorder in Japan: A qualitative interview study.日本自闭症谱系障碍青少年的友谊相关情感:一项定性访谈研究。
PLoS One. 2018 Feb 6;13(2):e0191538. doi: 10.1371/journal.pone.0191538. eCollection 2018.
7
Uncertainty and stress: Why it causes diseases and how it is mastered by the brain.不确定性与压力:其致病原因及大脑对其的掌控方式。
Prog Neurobiol. 2017 Sep;156:164-188. doi: 10.1016/j.pneurobio.2017.05.004. Epub 2017 May 30.
8
Dopamine reward prediction-error signalling: a two-component response.多巴胺奖励预测误差信号传导:一种双组分反应。
Nat Rev Neurosci. 2016 Mar;17(3):183-95. doi: 10.1038/nrn.2015.26. Epub 2016 Feb 11.
9
Reinforcement learning in multidimensional environments relies on attention mechanisms.多维环境中的强化学习依赖于注意力机制。
J Neurosci. 2015 May 27;35(21):8145-57. doi: 10.1523/JNEUROSCI.2978-14.2015.
10
Corrugator activity confirms immediate negative affect in surprise.皱眉肌活动证实惊讶时即刻出现负性情绪。
Front Psychol. 2015 Feb 16;6:134. doi: 10.3389/fpsyg.2015.00134. eCollection 2015.