强化学习与赢则留输则变决策过程的比较模型：向W.K. 埃斯蒂斯致敬。

A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

作者信息

Worthy Darrell A, Maddox W Todd

机构信息

Texas A&M University.

The University of Texas at Austin.

出版信息

J Math Psychol. 2014 Apr 1;59:41-49. doi: 10.1016/j.jmp.2013.10.001.

DOI:10.1016/j.jmp.2013.10.001

PMID:25214675

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4159167/

Abstract

W.K. Estes often championed an approach to model development whereby an existing model was augmented by the addition of one or more free parameters, and a comparison between the simple and more complex, augmented model determined whether the additions were justified. Following this same approach we utilized Estes' (1950) own augmented learning equations to improve the fit and plausibility of a win-stay-lose-shift (WSLS) model that we have used in much of our recent work. Estes also championed models that assumed a comparison between multiple concurrent cognitive processes. In line with this, we develop a WSLS-Reinforcement Learning (RL) model that assumes that the output of a WSLS process that provides a probability of staying or switching to a different option based on the last two decision outcomes is compared with the output of an RL process that determines a probability of selecting each option based on a comparison of the expected value of each option. Fits to data from three different decision-making experiments suggest that the augmentations to the WSLS and RL models lead to a better account of decision-making behavior. Our results also support the assertion that human participants weigh the output of WSLS and RL processes during decision-making.

摘要

W.K. 埃斯蒂斯经常倡导一种模型开发方法，即通过添加一个或多个自由参数来增强现有模型，然后比较简单模型和更复杂的增强模型，以确定添加参数是否合理。遵循同样的方法，我们利用埃斯蒂斯（1950年）自己的增强学习方程来提高我们在近期许多工作中使用的赢留输变（WSLS）模型的拟合度和合理性。埃斯蒂斯还支持那些假设多个并发认知过程之间存在比较的模型。与此一致，我们开发了一个WSLS强化学习（RL）模型，该模型假设基于最后两个决策结果提供停留或切换到不同选项概率的WSLS过程的输出，与基于每个选项预期值比较来确定选择每个选项概率的RL过程的输出进行比较。对来自三个不同决策实验数据的拟合表明，对WSLS和RL模型的增强能更好地解释决策行为。我们的结果也支持这样一种观点，即人类参与者在决策过程中会权衡WSLS和RL过程的输出。

相似文献

A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

J Math Psychol. 2014 Apr 1;59:41-49. doi: 10.1016/j.jmp.2013.10.001.

Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.

Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.

Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning.

J Neurosci Methods. 2020 Jul 15;341:108777. doi: 10.1016/j.jneumeth.2020.108777. Epub 2020 May 15.

Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task.

Front Psychol. 2013 Sep 30;4:640. doi: 10.3389/fpsyg.2013.00640. eCollection 2013.

Age-based differences in strategy use in choice tasks.

Front Neurosci. 2012 Jan 6;5:145. doi: 10.3389/fnins.2011.00145. eCollection 2012.

Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users.

Front Psychol. 2014 Aug 12;5:849. doi: 10.3389/fpsyg.2014.00849. eCollection 2014.

Working-memory load and temporal myopia in dynamic decision making.

J Exp Psychol Learn Mem Cogn. 2012 Nov;38(6):1640-58. doi: 10.1037/a0028146. Epub 2012 Apr 30.

Anhedonia and anxiety underlying depressive symptomatology have distinct effects on reward-based decision-making.

PLoS One. 2017 Oct 23;12(10):e0186473. doi: 10.1371/journal.pone.0186473. eCollection 2017.

Reward-driven decision-making impairments in schizophrenia.

Schizophr Res. 2019 Apr;206:277-283. doi: 10.1016/j.schres.2018.11.004. Epub 2018 Nov 12.

Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task.

Front Psychol. 2015 Dec 18;6:1910. doi: 10.3389/fpsyg.2015.01910. eCollection 2015.

引用本文的文献

Electrical brain activations in preadolescents during a probabilistic reward-learning task reflect cognitive processes and behavior strategies.

Front Hum Neurosci. 2025 Jan 30;19:1460584. doi: 10.3389/fnhum.2025.1460584. eCollection 2025.

Altered trial-to-trial responses to reward outcomes in KCNMA1 knockout mice during probabilistic learning tasks.

Behav Brain Funct. 2024 Dec 28;20(1):36. doi: 10.1186/s12993-024-00262-x.

Risky hybrid foraging: The impact of risk, reward value, and prevalence on foraging behavior in hybrid visual search.

J Exp Psychol Gen. 2024 Nov 14. doi: 10.1037/xge0001652.

Fruit bats adjust their decision-making process according to environmental dynamics.

BMC Biol. 2023 Nov 29;21(1):278. doi: 10.1186/s12915-023-01774-0.

Gen Psychiatr. 2023 Aug 10;36(4):e100985. doi: 10.1136/gpsych-2022-100985. eCollection 2023.

Aberrant uncertainty processing is linked to psychotic-like experiences, autistic traits, and is reflected in pupil dilation during probabilistic learning.

Cogn Affect Behav Neurosci. 2023 Jun;23(3):905-919. doi: 10.3758/s13415-023-01088-2. Epub 2023 Mar 28.

A guide to area-restricted search: a foundational foraging behaviour.

Biol Rev Camb Philos Soc. 2022 Dec;97(6):2076-2089. doi: 10.1111/brv.12883. Epub 2022 Jul 12.

Development of a novel computational model for the Balloon Analogue Risk Task: The Exponential-Weight Mean-Variance Model.

J Math Psychol. 2021 Jun;102. doi: 10.1016/j.jmp.2021.102532. Epub 2021 Apr 21.

Scalp recorded theta activity is modulated by reward, direction, and speed during virtual navigation in freely moving humans.

Sci Rep. 2022 Feb 7;12(1):2041. doi: 10.1038/s41598-022-05955-9.

The effect of obstructed action efficacy on reward-based decision-making in healthy adolescents: a novel functional MRI task to assay frustration.

Cogn Affect Behav Neurosci. 2022 Jun;22(3):542-556. doi: 10.3758/s13415-021-00975-w. Epub 2021 Dec 29.

本文引用的文献

Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.

Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.

Working-memory load and temporal myopia in dynamic decision making.

J Exp Psychol Learn Mem Cogn. 2012 Nov;38(6):1640-58. doi: 10.1037/a0028146. Epub 2012 Apr 30.

Age-based differences in strategy use in choice tasks.

Front Neurosci. 2012 Jan 6;5:145. doi: 10.3389/fnins.2011.00145. eCollection 2012.

With age comes wisdom: decision making in younger and older adults.

Psychol Sci. 2011 Nov;22(11):1375-80. doi: 10.1177/0956797611420301. Epub 2011 Sep 29.

Comparison of decision learning models using the generalization criterion method.

Cogn Sci. 2008 Dec;32(8):1376-402. doi: 10.1080/03640210802352992.

Model-based influences on humans' choices and striatal prediction errors.

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

Regulatory fit and systematic exploration in a dynamic decision-making environment.

J Exp Psychol Learn Mem Cogn. 2010 May;36(3):797-804. doi: 10.1037/a0018999.

Learning in Noise: Dynamic Decision-Making in a Variable Environment.

J Math Psychol. 2009 Jun;53(3):180-193. doi: 10.1016/j.jmp.2009.02.004.

Regulatory fit effects in a choice task.

Psychon Bull Rev. 2007 Dec;14(6):1125-32. doi: 10.3758/bf03193101.

Short-term memory traces for action bias in human reinforcement learning.

Brain Res. 2007 Jun 11;1153:111-21. doi: 10.1016/j.brainres.2007.03.057. Epub 2007 Mar 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

强化学习与赢则留输则变决策过程的比较模型：向W.K. 埃斯蒂斯致敬。

A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献