非稳定结果值下的最佳反应活力和选择。

Optimal response vigor and choice under non-stationary outcome values.

机构信息

School of Psychology, UNSW, Sydney, Australia.

Data61, Sydney, Australia.

出版信息

Psychon Bull Rev. 2019 Feb;26(1):182-204. doi: 10.3758/s13423-018-1500-3.

DOI:10.3758/s13423-018-1500-3

PMID:29971644

Abstract

Within a rational framework, a decision-maker selects actions based on the reward-maximization principle, which stipulates that they acquire outcomes with the highest value at the lowest cost. Action selection can be divided into two dimensions: selecting an action from various alternatives, and choosing its vigor, i.e., how fast the selected action should be executed. Both of these dimensions depend on the values of outcomes, which are often affected as more outcomes are consumed together with their associated actions. Despite this, previous research has only addressed the computational substrate of optimal actions in the specific condition that the values of outcomes are constant. It is not known what actions are optimal when the values of outcomes are non-stationary. Here, based on an optimal control framework, we derive a computational model for optimal actions when outcome values are non-stationary. The results imply that, even when the values of outcomes are changing, the optimal response rate is constant rather than decreasing. This finding shows that, in contrast to previous theories, commonly observed changes in action rate cannot be attributed solely to changes in outcome value. We then prove that this observation can be explained based on uncertainty about temporal horizons; e.g., the session duration. We further show that, when multiple outcomes are available, the model explains probability matching as well as maximization strategies. The model therefore provides a quantitative analysis of optimal action and explicit predictions for future testing.

摘要

在理性框架内，决策者根据奖励最大化原则选择行动，该原则规定他们以最低成本获得最高价值的结果。行动选择可以分为两个维度：从各种选择中选择一个行动，以及选择其活力，即所选行动应该多快执行。这两个维度都取决于结果的价值，而这些价值往往会受到一起消耗的更多结果及其相关行动的影响。尽管如此，以前的研究仅在结果值保持不变的特定条件下解决了最优行动的计算基础。当结果值不稳定时，什么行动是最优的尚不清楚。在这里，我们基于最优控制框架，为结果值不稳定时的最优行动推导了一个计算模型。结果表明，即使结果值发生变化，最优反应率也是恒定的，而不是下降的。这一发现表明，与先前的理论相反，常见的行动率变化不能仅仅归因于结果值的变化。然后，我们证明可以根据对时间范围的不确定性（例如会话持续时间）来解释这一观察结果。我们还进一步表明，当有多个结果可用时，该模型可以解释概率匹配和最大化策略。因此，该模型为最优行动提供了定量分析，并为未来的测试提供了明确的预测。

相似文献

Optimal response vigor and choice under non-stationary outcome values.

Psychon Bull Rev. 2019 Feb;26(1):182-204. doi: 10.3758/s13423-018-1500-3.

Tonic dopamine: opportunity costs and the control of response vigor.

Psychopharmacology (Berl). 2007 Apr;191(3):507-20. doi: 10.1007/s00213-006-0502-4. Epub 2006 Oct 10.

Reward Maximization Through Discrete Active Inference.

Neural Comput. 2023 Apr 18;35(5):807-852. doi: 10.1162/neco_a_01574.

Dopamine Manipulation Affects Response Vigor Independently of Opportunity Cost.

J Neurosci. 2016 Sep 14;36(37):9516-25. doi: 10.1523/JNEUROSCI.4467-15.2016.

Metaplasticity as a Neural Substrate for Adaptive Learning and Choice under Uncertainty.

Neuron. 2017 Apr 19;94(2):401-414.e6. doi: 10.1016/j.neuron.2017.03.044.

Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.

Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.

When does reward maximization lead to matching law?

PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.

Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward.

Front Behav Neurosci. 2020 Sep 4;14:141. doi: 10.3389/fnbeh.2020.00141. eCollection 2020.

Self-choice enhances value in reward-seeking in primates.

Neurosci Res. 2014 Mar;80:45-54. doi: 10.1016/j.neures.2014.01.004. Epub 2014 Jan 22.

Optimal decision making and matching are tied through diminishing returns.

Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.

引用本文的文献

Adaptation of sequential action benefits from timing variability related to lateral basal ganglia circuitry.

iScience. 2024 Feb 20;27(3):109274. doi: 10.1016/j.isci.2024.109274. eCollection 2024 Mar 15.

本文引用的文献

Optimal decision making and matching are tied through diminishing returns.

Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.

Taking the easy way out? Increasing implementation effort reduces probability maximizing under cognitive load.

Mem Cognit. 2016 Jul;44(5):806-18. doi: 10.3758/s13421-016-0595-x.

Of matchers and maximizers: How competition shapes choice under risk and uncertainty.

Cogn Psychol. 2015 May;78:78-98. doi: 10.1016/j.cogpsych.2015.03.002. Epub 2015 Apr 8.

Some work and some play: microscopic and macroscopic approaches to labor and leisure.

PLoS Comput Biol. 2014 Dec 4;10(12):e1003894. doi: 10.1371/journal.pcbi.1003894. eCollection 2014 Dec.

Homeostatic reinforcement learning for integrating reward collection and physiological stability.

Elife. 2014 Dec 2;3:e04811. doi: 10.7554/eLife.04811.

Motor costs and the coordination of the two arms.

J Neurosci. 2014 Jan 29;34(5):1806-18. doi: 10.1523/JNEUROSCI.3095-13.2014.

Dynamical regimes in neural network models of matching behavior.

Neural Comput. 2013 Dec;25(12):3093-112. doi: 10.1162/NECO_a_00522. Epub 2013 Sep 18.

Rational temporal predictions can underlie apparent failures to delay gratification.

Psychol Rev. 2013 Apr;120(2):395-410. doi: 10.1037/a0031910. Epub 2013 Mar 4.

Within- and between-session variety effects in a food-seeking habituation paradigm.

Appetite. 2013 Jul;66:10-9. doi: 10.1016/j.appet.2013.01.025. Epub 2013 Feb 19.

An examination of the generalizability of motor costs.

PLoS One. 2013;8(1):e53759. doi: 10.1371/journal.pone.0053759. Epub 2013 Jan 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

非稳定结果值下的最佳反应活力和选择。

Optimal response vigor and choice under non-stationary outcome values.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献