数据驱动的方程发现揭示了人类的非线性强化学习。

Data-driven equation discovery reveals nonlinear reinforcement learning in humans.

作者信息

LaFollette Kyle J, Yuval Janni, Schurr Roey, Melnikoff David, Goldenberg Amit

机构信息

Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH 44106.

Booth School of Business, University of Chicago, Chicago, IL 60637.

出版信息

Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2413441122. doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.

DOI:10.1073/pnas.2413441122

PMID:40743390

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12337339/

Abstract

Computational models of reinforcement learning (RL) have significantly contributed to our understanding of human behavior and decision-making. Traditional RL models, however, often adopt a linear approach to updating reward expectations, potentially oversimplifying the nuanced relationship between human behavior and rewards. To address these challenges and explore models of RL, we utilized a method of model discovery using equation discovery algorithms. This method, currently used mainly in physics and biology, attempts to capture data by proposing a differential equation from an array of suggested linear and nonlinear functions. Using this method, we were able to identify a model of RL which we termed the Quadratic Q-Weighted model. The model suggests that reward prediction errors obey nonlinear dynamics and exhibit negativity biases, resulting in an underweighting of reward when expectations are low, and an overweighting of the absence of reward when expectations are high. We tested the generalizability of our model by comparing it to classical models used in nine published studies. Our model surpassed traditional models in predictive accuracy across eight out of these nine published datasets, demonstrating not only its generalizability but also its potential to offer insights into the complexities of human learning. This work showcases the integration of a behavioral task with advanced computational methodologies as a potent strategy for uncovering the intricate patterns of human cognition, marking a significant step forward in the development of computational models that are both interpretable and broadly applicable.

摘要

强化学习（RL）的计算模型对我们理解人类行为和决策做出了重大贡献。然而，传统的RL模型通常采用线性方法来更新奖励期望，这可能会过度简化人类行为与奖励之间的细微关系。为了应对这些挑战并探索RL模型，我们使用了一种基于方程发现算法的模型发现方法。这种方法目前主要应用于物理学和生物学领域，它试图通过从一系列线性和非线性函数中提出一个微分方程来捕捉数据。通过这种方法，我们能够识别出一种RL模型，我们将其称为二次Q加权模型。该模型表明，奖励预测误差服从非线性动力学并表现出负偏差，导致期望较低时奖励权重不足，而期望较高时无奖励的权重过高。我们将我们的模型与九项已发表研究中使用的经典模型进行比较，测试了其通用性。在这九个已发表的数据集中，我们的模型在八个数据集的预测准确性上超过了传统模型，这不仅证明了其通用性，还展示了其洞察人类学习复杂性的潜力。这项工作展示了将行为任务与先进的计算方法相结合，作为揭示人类认知复杂模式的有效策略，标志着在可解释且广泛适用的计算模型发展方面向前迈出了重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ad5/12337339/14b12b9d34e4/pnas.2413441122fig01.jpg

相似文献

Data-driven equation discovery reveals nonlinear reinforcement learning in humans.数据驱动的方程发现揭示了人类的非线性强化学习。

Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2413441122. doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Do autistic individuals show atypical performance in probabilistic learning? A comparison of cue-number, predictive strength, and prediction error.自闭症个体在概率学习中是否表现出异常？线索数量、预测强度和预测误差的比较。

Mol Autism. 2025 Mar 4;16(1):15. doi: 10.1186/s13229-025-00651-7.

Short-Term Memory Impairment短期记忆障碍

Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力：系统评价和网络荟萃分析。

Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Factors that influence caregivers' and adolescents' views and practices regarding human papillomavirus (HPV) vaccination for adolescents: a qualitative evidence synthesis.影响照顾者和青少年对青少年人乳头瘤病毒（HPV）疫苗接种的看法及做法的因素：一项定性证据综合分析

Cochrane Database Syst Rev. 2025 Apr 15;4(4):CD013430. doi: 10.1002/14651858.CD013430.pub2.

本文引用的文献

Psychol Rev. 2024 Mar;131(2):456-493. doi: 10.1037/rev0000427. Epub 2023 Jun 8.

Dynamic prospect theory: Two core decision theories coexist in the gambling behavior of monkeys and humans.动态展望理论：两种核心决策理论在猴子和人类的赌博行为中共存。

Sci Adv. 2023 May 19;9(20):eade7972. doi: 10.1126/sciadv.ade7972.

What is dopamine doing in model-based reinforcement learning?多巴胺在基于模型的强化学习中起什么作用？

Curr Opin Behav Sci. 2021 Apr;38:74-82. doi: 10.1016/j.cobeha.2020.10.010.

Modelling human behaviour in cognitive tasks with latent dynamical systems.用潜在动力系统对认知任务中的人类行为进行建模。

Nat Hum Behav. 2023 Jun;7(6):986-1000. doi: 10.1038/s41562-022-01510-8. Epub 2023 Jan 19.

Stan: A Probabilistic Programming Language.斯坦：一种概率编程语言。

J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.

Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences.超越与自然的二十问游戏：社会与行为科学中的综合实验设计。

Behav Brain Sci. 2022 Dec 21;47:e33. doi: 10.1017/S0140525X22002874.

Choice-confirmation bias and gradual perseveration in human reinforcement learning.人类强化学习中的选择确认偏差与渐进式固执

Behav Neurosci. 2023 Feb;137(1):78-88. doi: 10.1037/bne0000541. Epub 2022 Nov 17.

Computational Scientific Discovery in Psychology.心理学中的计算科学发现。

Perspect Psychol Sci. 2023 Jan;18(1):178-189. doi: 10.1177/17456916221091833. Epub 2022 Aug 9.

People construct simplified mental representations to plan.人们构建简化的心理表征来进行规划。

Nature. 2022 Jun;606(7912):129-136. doi: 10.1038/s41586-022-04743-9. Epub 2022 May 19.

Using deep learning to predict human decisions and using cognitive models to explain deep learning models.利用深度学习预测人类决策，并利用认知模型解释深度学习模型。

Sci Rep. 2022 Mar 18;12(1):4736. doi: 10.1038/s41598-022-08863-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

数据驱动的方程发现揭示了人类的非线性强化学习。

Data-driven equation discovery reveals nonlinear reinforcement learning in humans.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献