• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非稳定结果值下的最佳反应活力和选择。

Optimal response vigor and choice under non-stationary outcome values.

机构信息

School of Psychology, UNSW, Sydney, Australia.

Data61, Sydney, Australia.

出版信息

Psychon Bull Rev. 2019 Feb;26(1):182-204. doi: 10.3758/s13423-018-1500-3.

DOI:10.3758/s13423-018-1500-3
PMID:29971644
Abstract

Within a rational framework, a decision-maker selects actions based on the reward-maximization principle, which stipulates that they acquire outcomes with the highest value at the lowest cost. Action selection can be divided into two dimensions: selecting an action from various alternatives, and choosing its vigor, i.e., how fast the selected action should be executed. Both of these dimensions depend on the values of outcomes, which are often affected as more outcomes are consumed together with their associated actions. Despite this, previous research has only addressed the computational substrate of optimal actions in the specific condition that the values of outcomes are constant. It is not known what actions are optimal when the values of outcomes are non-stationary. Here, based on an optimal control framework, we derive a computational model for optimal actions when outcome values are non-stationary. The results imply that, even when the values of outcomes are changing, the optimal response rate is constant rather than decreasing. This finding shows that, in contrast to previous theories, commonly observed changes in action rate cannot be attributed solely to changes in outcome value. We then prove that this observation can be explained based on uncertainty about temporal horizons; e.g., the session duration. We further show that, when multiple outcomes are available, the model explains probability matching as well as maximization strategies. The model therefore provides a quantitative analysis of optimal action and explicit predictions for future testing.

摘要

在理性框架内,决策者根据奖励最大化原则选择行动,该原则规定他们以最低成本获得最高价值的结果。行动选择可以分为两个维度:从各种选择中选择一个行动,以及选择其活力,即所选行动应该多快执行。这两个维度都取决于结果的价值,而这些价值往往会受到一起消耗的更多结果及其相关行动的影响。尽管如此,以前的研究仅在结果值保持不变的特定条件下解决了最优行动的计算基础。当结果值不稳定时,什么行动是最优的尚不清楚。在这里,我们基于最优控制框架,为结果值不稳定时的最优行动推导了一个计算模型。结果表明,即使结果值发生变化,最优反应率也是恒定的,而不是下降的。这一发现表明,与先前的理论相反,常见的行动率变化不能仅仅归因于结果值的变化。然后,我们证明可以根据对时间范围的不确定性(例如会话持续时间)来解释这一观察结果。我们还进一步表明,当有多个结果可用时,该模型可以解释概率匹配和最大化策略。因此,该模型为最优行动提供了定量分析,并为未来的测试提供了明确的预测。

相似文献

1
Optimal response vigor and choice under non-stationary outcome values.非稳定结果值下的最佳反应活力和选择。
Psychon Bull Rev. 2019 Feb;26(1):182-204. doi: 10.3758/s13423-018-1500-3.
2
Tonic dopamine: opportunity costs and the control of response vigor.紧张性多巴胺:机会成本与反应强度的控制
Psychopharmacology (Berl). 2007 Apr;191(3):507-20. doi: 10.1007/s00213-006-0502-4. Epub 2006 Oct 10.
3
Reward Maximization Through Discrete Active Inference.通过离散主动推理实现奖励最大化。
Neural Comput. 2023 Apr 18;35(5):807-852. doi: 10.1162/neco_a_01574.
4
Dopamine Manipulation Affects Response Vigor Independently of Opportunity Cost.多巴胺调控对反应活力的影响独立于机会成本。
J Neurosci. 2016 Sep 14;36(37):9516-25. doi: 10.1523/JNEUROSCI.4467-15.2016.
5
Metaplasticity as a Neural Substrate for Adaptive Learning and Choice under Uncertainty.作为不确定性下适应性学习与选择的神经基础的元可塑性
Neuron. 2017 Apr 19;94(2):401-414.e6. doi: 10.1016/j.neuron.2017.03.044.
6
Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.在厌恶环境背景下的奖励和回避学习及其对抑郁症状的可能影响。
Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.
7
When does reward maximization lead to matching law?奖励最大化何时会导致匹配法则?
PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.
8
Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward.无行动学习:节省与行动相关的成本充当一种隐性奖励。
Front Behav Neurosci. 2020 Sep 4;14:141. doi: 10.3389/fnbeh.2020.00141. eCollection 2020.
9
Self-choice enhances value in reward-seeking in primates.自我选择增强了灵长类动物在寻求奖励中的价值。
Neurosci Res. 2014 Mar;80:45-54. doi: 10.1016/j.neures.2014.01.004. Epub 2014 Jan 22.
10
Optimal decision making and matching are tied through diminishing returns.最优决策和匹配是通过收益递减联系在一起的。
Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.

引用本文的文献

1
Adaptation of sequential action benefits from timing variability related to lateral basal ganglia circuitry.顺序动作的适应性受益于与外侧基底神经节回路相关的时间变异性。
iScience. 2024 Feb 20;27(3):109274. doi: 10.1016/j.isci.2024.109274. eCollection 2024 Mar 15.

本文引用的文献

1
Optimal decision making and matching are tied through diminishing returns.最优决策和匹配是通过收益递减联系在一起的。
Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.
2
Taking the easy way out? Increasing implementation effort reduces probability maximizing under cognitive load.走捷径?增加执行努力会降低认知负荷下的概率最大化。
Mem Cognit. 2016 Jul;44(5):806-18. doi: 10.3758/s13421-016-0595-x.
3
Of matchers and maximizers: How competition shapes choice under risk and uncertainty.
匹配者与最大化者:竞争如何塑造风险和不确定性下的选择。
Cogn Psychol. 2015 May;78:78-98. doi: 10.1016/j.cogpsych.2015.03.002. Epub 2015 Apr 8.
4
Some work and some play: microscopic and macroscopic approaches to labor and leisure.劳逸结合:微观与宏观视角下的工作与休闲
PLoS Comput Biol. 2014 Dec 4;10(12):e1003894. doi: 10.1371/journal.pcbi.1003894. eCollection 2014 Dec.
5
Homeostatic reinforcement learning for integrating reward collection and physiological stability.用于整合奖励收集和生理稳定性的稳态强化学习。
Elife. 2014 Dec 2;3:e04811. doi: 10.7554/eLife.04811.
6
Motor costs and the coordination of the two arms.运动成本与双臂的协调性。
J Neurosci. 2014 Jan 29;34(5):1806-18. doi: 10.1523/JNEUROSCI.3095-13.2014.
7
Dynamical regimes in neural network models of matching behavior.匹配行为的神经网络模型中的动力学状态。
Neural Comput. 2013 Dec;25(12):3093-112. doi: 10.1162/NECO_a_00522. Epub 2013 Sep 18.
8
Rational temporal predictions can underlie apparent failures to delay gratification.理性的时间预测可以解释明显的即时满足失败。
Psychol Rev. 2013 Apr;120(2):395-410. doi: 10.1037/a0031910. Epub 2013 Mar 4.
9
Within- and between-session variety effects in a food-seeking habituation paradigm.在食物寻求习惯化范式中的内-会话变异性效应。
Appetite. 2013 Jul;66:10-9. doi: 10.1016/j.appet.2013.01.025. Epub 2013 Feb 19.
10
An examination of the generalizability of motor costs.运动成本的可推广性研究。
PLoS One. 2013;8(1):e53759. doi: 10.1371/journal.pone.0053759. Epub 2013 Jan 14.