Tyler Boyd-Meredith J, Piet Alex T, Kopec Chuck D, Brody Carlos D
Princeton Neuroscience Institute, Princeton University, Princeton, United States.
Sainsbury Wellcome Centre, University College London, London, UK.
bioRxiv. 2024 Jun 20:2024.06.07.597954. doi: 10.1101/2024.06.07.597954.
Rational decision-makers invest more time pursuing rewards they are more confident they will eventually receive. A series of studies have therefore used willingness to wait for delayed rewards as a proxy for decision confidence. However, interpretation of waiting behavior is limited because it is unclear how environmental statistics influence optimal waiting, and how sources of internal variability influence subjects' behavior. We trained rats to perform a confidence-guided waiting task, and derived expressions for optimal waiting that make relevant environmental statistics explicit, including travel time incurred traveling from one reward opportunity to another. We found that rats waited longer than fully optimal agents, but that their behavior was closely matched by optimal agents with travel times constrained to match their own. We developed a process model describing the decision to stop waiting as an accumulation to bound process, which allowed us to compare the effects of multiple sources of internal variability on waiting. Surprisingly, although mean wait times grew with confidence, variability did not, inconsistent with scalar invariant timing, and best explained by variability in the stopping bound. Our results describe a tractable process model that can capture the influence of environmental statistics and internal sources of variability on subjects' decision process during confidence-guided waiting.
理性的决策者会投入更多时间去追求他们更有信心最终能获得的奖励。因此,一系列研究将等待延迟奖励的意愿作为决策信心的一个指标。然而,对等待行为的解释是有限的,因为尚不清楚环境统计数据如何影响最优等待,以及内部变异性的来源如何影响受试者的行为。我们训练大鼠执行一项由信心引导的等待任务,并推导出最优等待的表达式,使相关的环境统计数据变得明确,包括从一个奖励机会到另一个奖励机会所花费的旅行时间。我们发现,大鼠等待的时间比完全最优的主体更长,但它们的行为与旅行时间被限制为与其自身相匹配的最优主体密切匹配。我们开发了一个过程模型,将停止等待的决策描述为一个累积到边界的过程,这使我们能够比较多种内部变异性来源对等待的影响。令人惊讶的是,尽管平均等待时间随着信心的增加而增长,但变异性并没有,这与标量不变计时不一致,并且最好由停止边界的变异性来解释。我们的结果描述了一个易于处理的过程模型,该模型可以捕捉环境统计数据和内部变异性来源对受试者在信心引导等待期间决策过程的影响。