Suppr超能文献

在时间决策中启动追求行为的价值。

The value of initiating a pursuit in temporal decision-making.

作者信息

Sutlief Elissa, Walters Charlie, Marton Tanya, Hussain Shuler Marshall G

机构信息

Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States.

Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, United States.

出版信息

Elife. 2025 Mar 28;13:RP99957. doi: 10.7554/eLife.99957.

Abstract

Reward-rate maximization is a prominent normative principle in behavioral ecology, neuroscience, economics, and AI. Here, we identify, compare, and analyze equations to maximize reward rate when assessing whether to initiate a pursuit. In deriving expressions for the value of a pursuit, we show that time's cost consists of both apportionment and opportunity cost. Reformulating value as a discounting function, we show precisely how a reward-rate-optimal agent's discounting function (1) combines hyperbolic and linear components reflecting apportionment and opportunity costs, and (2) is dependent not only on the considered pursuit's properties but also on time spent and rewards obtained outside the pursuit. This analysis reveals how purported signs of suboptimal behavior (hyperbolic discounting, and the Delay, Magnitude, and Sign effects) are in fact consistent with reward-rate maximization. To better account for observed decision-making errors in humans and animals, we then analyze the impact of misestimating reward-rate-maximizing parameters and find that suboptimal decisions likely stem from errors in assessing time's apportionment-specifically, underweighting time spent outside versus inside a pursuit-which we term the 'Malapportionment Hypothesis'. This understanding of the true pattern of temporal decision-making errors is essential to deducing the learning algorithms and representational architectures actually used by humans and animals.

摘要

奖励率最大化是行为生态学、神经科学、经济学和人工智能中一个突出的规范原则。在这里,我们识别、比较和分析了在评估是否开始追求时最大化奖励率的方程。在推导追求价值的表达式时,我们表明时间成本包括分配成本和机会成本。将价值重新表述为贴现函数后,我们精确地展示了奖励率最优主体的贴现函数(1)如何结合反映分配成本和机会成本的双曲线和线性成分,以及(2)不仅取决于所考虑追求的属性,还取决于在追求之外花费的时间和获得的奖励。这一分析揭示了所谓的次优行为迹象(双曲线贴现以及延迟、量级和符号效应)实际上如何与奖励率最大化相一致。为了更好地解释在人类和动物中观察到的决策错误,我们接着分析了错误估计奖励率最大化参数的影响,并发现次优决策可能源于评估时间分配时的错误——具体而言,低估了在追求之外与之内花费的时间——我们将其称为“分配不当假说”。对时间决策错误的真实模式的这种理解对于推断人类和动物实际使用的学习算法和表征架构至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff41/11952749/616047f0cd42/elife-99957-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验