LaFollette Kyle J, Yuval Janni, Schurr Roey, Melnikoff David, Goldenberg Amit
Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH 44106.
Booth School of Business, University of Chicago, Chicago, IL 60637.
Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2413441122. doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.
Computational models of reinforcement learning (RL) have significantly contributed to our understanding of human behavior and decision-making. Traditional RL models, however, often adopt a linear approach to updating reward expectations, potentially oversimplifying the nuanced relationship between human behavior and rewards. To address these challenges and explore models of RL, we utilized a method of model discovery using equation discovery algorithms. This method, currently used mainly in physics and biology, attempts to capture data by proposing a differential equation from an array of suggested linear and nonlinear functions. Using this method, we were able to identify a model of RL which we termed the Quadratic Q-Weighted model. The model suggests that reward prediction errors obey nonlinear dynamics and exhibit negativity biases, resulting in an underweighting of reward when expectations are low, and an overweighting of the absence of reward when expectations are high. We tested the generalizability of our model by comparing it to classical models used in nine published studies. Our model surpassed traditional models in predictive accuracy across eight out of these nine published datasets, demonstrating not only its generalizability but also its potential to offer insights into the complexities of human learning. This work showcases the integration of a behavioral task with advanced computational methodologies as a potent strategy for uncovering the intricate patterns of human cognition, marking a significant step forward in the development of computational models that are both interpretable and broadly applicable.
强化学习(RL)的计算模型对我们理解人类行为和决策做出了重大贡献。然而,传统的RL模型通常采用线性方法来更新奖励期望,这可能会过度简化人类行为与奖励之间的细微关系。为了应对这些挑战并探索RL模型,我们使用了一种基于方程发现算法的模型发现方法。这种方法目前主要应用于物理学和生物学领域,它试图通过从一系列线性和非线性函数中提出一个微分方程来捕捉数据。通过这种方法,我们能够识别出一种RL模型,我们将其称为二次Q加权模型。该模型表明,奖励预测误差服从非线性动力学并表现出负偏差,导致期望较低时奖励权重不足,而期望较高时无奖励的权重过高。我们将我们的模型与九项已发表研究中使用的经典模型进行比较,测试了其通用性。在这九个已发表的数据集中,我们的模型在八个数据集的预测准确性上超过了传统模型,这不仅证明了其通用性,还展示了其洞察人类学习复杂性的潜力。这项工作展示了将行为任务与先进的计算方法相结合,作为揭示人类认知复杂模式的有效策略,标志着在可解释且广泛适用的计算模型发展方面向前迈出了重要一步。