School of Psychology, UNSW, Sydney, Australia.
Data61, Sydney, Australia.
Psychon Bull Rev. 2019 Feb;26(1):182-204. doi: 10.3758/s13423-018-1500-3.
Within a rational framework, a decision-maker selects actions based on the reward-maximization principle, which stipulates that they acquire outcomes with the highest value at the lowest cost. Action selection can be divided into two dimensions: selecting an action from various alternatives, and choosing its vigor, i.e., how fast the selected action should be executed. Both of these dimensions depend on the values of outcomes, which are often affected as more outcomes are consumed together with their associated actions. Despite this, previous research has only addressed the computational substrate of optimal actions in the specific condition that the values of outcomes are constant. It is not known what actions are optimal when the values of outcomes are non-stationary. Here, based on an optimal control framework, we derive a computational model for optimal actions when outcome values are non-stationary. The results imply that, even when the values of outcomes are changing, the optimal response rate is constant rather than decreasing. This finding shows that, in contrast to previous theories, commonly observed changes in action rate cannot be attributed solely to changes in outcome value. We then prove that this observation can be explained based on uncertainty about temporal horizons; e.g., the session duration. We further show that, when multiple outcomes are available, the model explains probability matching as well as maximization strategies. The model therefore provides a quantitative analysis of optimal action and explicit predictions for future testing.
在理性框架内,决策者根据奖励最大化原则选择行动,该原则规定他们以最低成本获得最高价值的结果。行动选择可以分为两个维度:从各种选择中选择一个行动,以及选择其活力,即所选行动应该多快执行。这两个维度都取决于结果的价值,而这些价值往往会受到一起消耗的更多结果及其相关行动的影响。尽管如此,以前的研究仅在结果值保持不变的特定条件下解决了最优行动的计算基础。当结果值不稳定时,什么行动是最优的尚不清楚。在这里,我们基于最优控制框架,为结果值不稳定时的最优行动推导了一个计算模型。结果表明,即使结果值发生变化,最优反应率也是恒定的,而不是下降的。这一发现表明,与先前的理论相反,常见的行动率变化不能仅仅归因于结果值的变化。然后,我们证明可以根据对时间范围的不确定性(例如会话持续时间)来解释这一观察结果。我们还进一步表明,当有多个结果可用时,该模型可以解释概率匹配和最大化策略。因此,该模型为最优行动提供了定量分析,并为未来的测试提供了明确的预测。