• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习与赢则留输则变决策过程的比较模型:向W.K. 埃斯蒂斯致敬。

A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

作者信息

Worthy Darrell A, Maddox W Todd

机构信息

Texas A&M University.

The University of Texas at Austin.

出版信息

J Math Psychol. 2014 Apr 1;59:41-49. doi: 10.1016/j.jmp.2013.10.001.

DOI:10.1016/j.jmp.2013.10.001
PMID:25214675
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4159167/
Abstract

W.K. Estes often championed an approach to model development whereby an existing model was augmented by the addition of one or more free parameters, and a comparison between the simple and more complex, augmented model determined whether the additions were justified. Following this same approach we utilized Estes' (1950) own augmented learning equations to improve the fit and plausibility of a win-stay-lose-shift (WSLS) model that we have used in much of our recent work. Estes also championed models that assumed a comparison between multiple concurrent cognitive processes. In line with this, we develop a WSLS-Reinforcement Learning (RL) model that assumes that the output of a WSLS process that provides a probability of staying or switching to a different option based on the last two decision outcomes is compared with the output of an RL process that determines a probability of selecting each option based on a comparison of the expected value of each option. Fits to data from three different decision-making experiments suggest that the augmentations to the WSLS and RL models lead to a better account of decision-making behavior. Our results also support the assertion that human participants weigh the output of WSLS and RL processes during decision-making.

摘要

W.K. 埃斯蒂斯经常倡导一种模型开发方法,即通过添加一个或多个自由参数来增强现有模型,然后比较简单模型和更复杂的增强模型,以确定添加参数是否合理。遵循同样的方法,我们利用埃斯蒂斯(1950年)自己的增强学习方程来提高我们在近期许多工作中使用的赢留输变(WSLS)模型的拟合度和合理性。埃斯蒂斯还支持那些假设多个并发认知过程之间存在比较的模型。与此一致,我们开发了一个WSLS强化学习(RL)模型,该模型假设基于最后两个决策结果提供停留或切换到不同选项概率的WSLS过程的输出,与基于每个选项预期值比较来确定选择每个选项概率的RL过程的输出进行比较。对来自三个不同决策实验数据的拟合表明,对WSLS和RL模型的增强能更好地解释决策行为。我们的结果也支持这样一种观点,即人类参与者在决策过程中会权衡WSLS和RL过程的输出。

相似文献

1
A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.强化学习与赢则留输则变决策过程的比较模型:向W.K. 埃斯蒂斯致敬。
J Math Psychol. 2014 Apr 1;59:41-49. doi: 10.1016/j.jmp.2013.10.001.
2
Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.策略使用的异质性在爱荷华赌博任务中:赢留输变和强化学习模型的比较。
Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.
3
Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning.探究强化学习与简单行为策略之间的关系,以理解概率性奖励学习。
J Neurosci Methods. 2020 Jul 15;341:108777. doi: 10.1016/j.jneumeth.2020.108777. Epub 2020 May 15.
4
Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task.解析坚持性和期望价值表示在爱荷华赌博任务模型中的作用。
Front Psychol. 2013 Sep 30;4:640. doi: 10.3389/fpsyg.2013.00640. eCollection 2013.
5
Age-based differences in strategy use in choice tasks.选择任务中策略使用的年龄差异。
Front Neurosci. 2012 Jan 6;5:145. doi: 10.3389/fnins.2011.00145. eCollection 2012.
6
Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users.长期戒断的兴奋剂和阿片类药物成瘾者的决策:纯使用者计算模型的证据。
Front Psychol. 2014 Aug 12;5:849. doi: 10.3389/fpsyg.2014.00849. eCollection 2014.
7
Working-memory load and temporal myopia in dynamic decision making.工作记忆负荷与动态决策中的时间近视。
J Exp Psychol Learn Mem Cogn. 2012 Nov;38(6):1640-58. doi: 10.1037/a0028146. Epub 2012 Apr 30.
8
Anhedonia and anxiety underlying depressive symptomatology have distinct effects on reward-based decision-making.快感缺失和抑郁症状背后的焦虑对基于奖励的决策有不同影响。
PLoS One. 2017 Oct 23;12(10):e0186473. doi: 10.1371/journal.pone.0186473. eCollection 2017.
9
Reward-driven decision-making impairments in schizophrenia.精神分裂症患者的奖赏驱动决策障碍。
Schizophr Res. 2019 Apr;206:277-283. doi: 10.1016/j.schres.2018.11.004. Epub 2018 Nov 12.
10
Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task.甲基苯丙胺成瘾中统计学习与决策的改变:来自双臂赌博任务的证据
Front Psychol. 2015 Dec 18;6:1910. doi: 10.3389/fpsyg.2015.01910. eCollection 2015.

引用本文的文献

1
Electrical brain activations in preadolescents during a probabilistic reward-learning task reflect cognitive processes and behavior strategies.青春期前儿童在概率性奖励学习任务中的脑电激活反映了认知过程和行为策略。
Front Hum Neurosci. 2025 Jan 30;19:1460584. doi: 10.3389/fnhum.2025.1460584. eCollection 2025.
2
Altered trial-to-trial responses to reward outcomes in KCNMA1 knockout mice during probabilistic learning tasks.在概率性学习任务期间,KCNMA1基因敲除小鼠对奖励结果的逐次试验反应发生改变。
Behav Brain Funct. 2024 Dec 28;20(1):36. doi: 10.1186/s12993-024-00262-x.
3
Risky hybrid foraging: The impact of risk, reward value, and prevalence on foraging behavior in hybrid visual search.有风险的混合觅食:风险、奖励价值和发生率对混合视觉搜索中觅食行为的影响。
J Exp Psychol Gen. 2024 Nov 14. doi: 10.1037/xge0001652.
4
Fruit bats adjust their decision-making process according to environmental dynamics.果蝠会根据环境动态调整其决策过程。
BMC Biol. 2023 Nov 29;21(1):278. doi: 10.1186/s12915-023-01774-0.
5
Impulsivity-related right superior frontal gyrus as a biomarker of internet gaming disorder.与冲动性相关的右侧额上回作为网络游戏障碍的生物标志物。
Gen Psychiatr. 2023 Aug 10;36(4):e100985. doi: 10.1136/gpsych-2022-100985. eCollection 2023.
6
Aberrant uncertainty processing is linked to psychotic-like experiences, autistic traits, and is reflected in pupil dilation during probabilistic learning.异常的不确定性处理与类精神病体验、自闭症特征有关,并反映在概率学习过程中的瞳孔扩张上。
Cogn Affect Behav Neurosci. 2023 Jun;23(3):905-919. doi: 10.3758/s13415-023-01088-2. Epub 2023 Mar 28.
7
A guide to area-restricted search: a foundational foraging behaviour.区域限制搜索指南:一种基础性的觅食行为。
Biol Rev Camb Philos Soc. 2022 Dec;97(6):2076-2089. doi: 10.1111/brv.12883. Epub 2022 Jul 12.
8
Development of a novel computational model for the Balloon Analogue Risk Task: The Exponential-Weight Mean-Variance Model.用于气球模拟风险任务的新型计算模型的开发:指数加权均值-方差模型。
J Math Psychol. 2021 Jun;102. doi: 10.1016/j.jmp.2021.102532. Epub 2021 Apr 21.
9
Scalp recorded theta activity is modulated by reward, direction, and speed during virtual navigation in freely moving humans.头皮记录的θ活动在自由移动的人类进行虚拟导航时会受到奖励、方向和速度的调节。
Sci Rep. 2022 Feb 7;12(1):2041. doi: 10.1038/s41598-022-05955-9.
10
The effect of obstructed action efficacy on reward-based decision-making in healthy adolescents: a novel functional MRI task to assay frustration.阻塞动作效能对健康青少年基于奖励的决策的影响:一种新型功能磁共振成像任务来检测挫折感。
Cogn Affect Behav Neurosci. 2022 Jun;22(3):542-556. doi: 10.3758/s13415-021-00975-w. Epub 2021 Dec 29.

本文引用的文献

1
Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.策略使用的异质性在爱荷华赌博任务中:赢留输变和强化学习模型的比较。
Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.
2
Working-memory load and temporal myopia in dynamic decision making.工作记忆负荷与动态决策中的时间近视。
J Exp Psychol Learn Mem Cogn. 2012 Nov;38(6):1640-58. doi: 10.1037/a0028146. Epub 2012 Apr 30.
3
Age-based differences in strategy use in choice tasks.选择任务中策略使用的年龄差异。
Front Neurosci. 2012 Jan 6;5:145. doi: 10.3389/fnins.2011.00145. eCollection 2012.
4
With age comes wisdom: decision making in younger and older adults.年龄带来智慧:年轻人和老年人的决策制定。
Psychol Sci. 2011 Nov;22(11):1375-80. doi: 10.1177/0956797611420301. Epub 2011 Sep 29.
5
Comparison of decision learning models using the generalization criterion method.基于泛化准则方法的决策学习模型比较。
Cogn Sci. 2008 Dec;32(8):1376-402. doi: 10.1080/03640210802352992.
6
Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。
Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.
7
Regulatory fit and systematic exploration in a dynamic decision-making environment.在动态决策环境中进行的调节适配和系统探索。
J Exp Psychol Learn Mem Cogn. 2010 May;36(3):797-804. doi: 10.1037/a0018999.
8
Learning in Noise: Dynamic Decision-Making in a Variable Environment.噪声中的学习:可变环境下的动态决策
J Math Psychol. 2009 Jun;53(3):180-193. doi: 10.1016/j.jmp.2009.02.004.
9
Regulatory fit effects in a choice task.选择任务中的调节匹配效应。
Psychon Bull Rev. 2007 Dec;14(6):1125-32. doi: 10.3758/bf03193101.
10
Short-term memory traces for action bias in human reinforcement learning.人类强化学习中动作偏差的短期记忆痕迹
Brain Res. 2007 Jun 11;1153:111-21. doi: 10.1016/j.brainres.2007.03.057. Epub 2007 Mar 24.