Sutlief Elissa, Walters Charlie, Marton Tanya, Hussain Shuler Marshall G
Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States.
Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, United States.
Elife. 2025 Mar 28;13:RP99957. doi: 10.7554/eLife.99957.
Reward-rate maximization is a prominent normative principle in behavioral ecology, neuroscience, economics, and AI. Here, we identify, compare, and analyze equations to maximize reward rate when assessing whether to initiate a pursuit. In deriving expressions for the value of a pursuit, we show that time's cost consists of both apportionment and opportunity cost. Reformulating value as a discounting function, we show precisely how a reward-rate-optimal agent's discounting function (1) combines hyperbolic and linear components reflecting apportionment and opportunity costs, and (2) is dependent not only on the considered pursuit's properties but also on time spent and rewards obtained outside the pursuit. This analysis reveals how purported signs of suboptimal behavior (hyperbolic discounting, and the Delay, Magnitude, and Sign effects) are in fact consistent with reward-rate maximization. To better account for observed decision-making errors in humans and animals, we then analyze the impact of misestimating reward-rate-maximizing parameters and find that suboptimal decisions likely stem from errors in assessing time's apportionment-specifically, underweighting time spent outside versus inside a pursuit-which we term the 'Malapportionment Hypothesis'. This understanding of the true pattern of temporal decision-making errors is essential to deducing the learning algorithms and representational architectures actually used by humans and animals.
奖励率最大化是行为生态学、神经科学、经济学和人工智能中一个突出的规范原则。在这里,我们识别、比较和分析了在评估是否开始追求时最大化奖励率的方程。在推导追求价值的表达式时,我们表明时间成本包括分配成本和机会成本。将价值重新表述为贴现函数后,我们精确地展示了奖励率最优主体的贴现函数(1)如何结合反映分配成本和机会成本的双曲线和线性成分,以及(2)不仅取决于所考虑追求的属性,还取决于在追求之外花费的时间和获得的奖励。这一分析揭示了所谓的次优行为迹象(双曲线贴现以及延迟、量级和符号效应)实际上如何与奖励率最大化相一致。为了更好地解释在人类和动物中观察到的决策错误,我们接着分析了错误估计奖励率最大化参数的影响,并发现次优决策可能源于评估时间分配时的错误——具体而言,低估了在追求之外与之内花费的时间——我们将其称为“分配不当假说”。对时间决策错误的真实模式的这种理解对于推断人类和动物实际使用的学习算法和表征架构至关重要。