预测下一个反应：展示将基于人工智能的强化学习与行为科学相结合的效用。

Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.

作者信息

Cox David J, Santos Carlos

机构信息

Institute of Applied Behavioral Science at Endicott College, Beverly, MA USA.

Mosaic Pediatric Therapy, Charlotte, NC USA.

出版信息

Perspect Behav Sci. 2025 Apr 30;48(2):241-267. doi: 10.1007/s40614-025-00444-6. eCollection 2025 Jun.

DOI:10.1007/s40614-025-00444-6

PMID:40520581

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12162397/

Abstract

The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms behave as a function of their environment, whereas AI focuses on how artificial agents behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13 predictive approach, termed "human choice modeled by Q-learning," uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.

摘要

强化和惩罚的概念产生于心理学和人工智能（AI）这两个截然不同的科学领域。行为科学家研究生物有机体如何根据其环境表现，而人工智能则关注人工智能体如何行为以最大化奖励或最小化惩罚。本文描述了基于人工智能的强化学习（RL）的广泛特征，它与操作性研究有何不同，以及如何将两者的见解结合起来推动这两个领域的研究。为了证明这种相互效用，为六名参与者构建了12个人工有机体（AO）来预测他们发出的下一个反应。每个AO使用由操作性研究提供的六种特征集组合中的一种，有或没有惩罚错误预测。一种称为“通过Q学习建模的人类选择”的预测方法，使用Q学习机制在每次反应后更新情境 - 反应 - 结果值，并选择下一个反应。这种方法实现了最高的平均预测准确率，为95%（范围为90% - 99%）。次高的准确率平均为89%（范围：85% - 93%），需要分子和摩尔信息以及惩罚偶然性。仅基于摩尔或分子信息且有惩罚偶然性的预测平均准确率为71% - 72%。没有惩罚时，无论特征集如何，预测准确率降至47% - 54%。这项工作强调了基于人工智能的强化学习技术与操作性和反应性领域知识相结合，如何能够提高行为科学家预测生物体行为的能力。这些技术还使研究人员能够解决关于重要主题的理论问题，如行为的多尺度模型以及惩罚在学习中的作用。

相似文献

Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.预测下一个反应：展示将基于人工智能的强化学习与行为科学相结合的效用。

Perspect Behav Sci. 2025 Apr 30;48(2):241-267. doi: 10.1007/s40614-025-00444-6. eCollection 2025 Jun.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历：系统检索与综述

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Can We Enhance Shared Decision-making for Periacetabular Osteotomy Surgery? A Qualitative Study of Patient Experiences.我们能否加强髋臼周围截骨术的共同决策？一项关于患者体验的定性研究。

Clin Orthop Relat Res. 2025 Jan 1;483(1):120-136. doi: 10.1097/CORR.0000000000003198. Epub 2024 Jul 23.

Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.在自闭症群体的参与下为成年自闭症患者调整安全计划。

Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.

"I Don't Understand Their Sense of Belonging": Exploring How Nonbinary Autistic Adults Experience Gender.“我不理解他们的归属感”：探索非二元性别的自闭症成年人如何体验性别。

Autism Adulthood. 2024 Dec 2;6(4):462-473. doi: 10.1089/aut.2023.0071. eCollection 2024 Dec.

The educational effects of portfolios on undergraduate student learning: a Best Evidence Medical Education (BEME) systematic review. BEME Guide No. 11.档案袋对本科学生学习的教育效果：最佳证据医学教育（BEME）系统评价。BEME指南第11号。

Med Teach. 2009 Apr;31(4):282-98. doi: 10.1080/01421590902889897.

本文引用的文献

Application of the evolutionary theory of behavior dynamics to severe challenging behavior.应用行为动力学的进化理论来处理严重的挑战性行为。

J Appl Behav Anal. 2023;56(4):729-744. doi: 10.1002/jaba.1018. Epub 2023 Aug 23.

Scaling from 1 to 1,000,000: Application of the Generalized Matching Law to Big Data Contexts.从1到100万的缩放：广义匹配定律在大数据环境中的应用。

Perspect Behav Sci. 2021 Jun 9;44(4):641-665. doi: 10.1007/s40614-021-00298-8. eCollection 2021 Dec.

Modeling choice across time: Effects of response-reinforcer discriminability.跨时间的建模选择：反应-强化物可辨别性的影响。

J Exp Anal Behav. 2022 Jan;117(1):36-52. doi: 10.1002/jeab.723. Epub 2021 Nov 4.

Molecular (moment-to-moment) and molar (aggregate) analyses of behavior.行为的分子（瞬间到瞬间）和摩尔（总和）分析。

J Exp Anal Behav. 2020 Nov;114(3):394-429. doi: 10.1002/jeab.626. Epub 2020 Oct 7.

Further application of the generalized matching law to multialternative sports contexts.广义匹配律在多项选择运动情境中的进一步应用。

J Appl Behav Anal. 2021 Jan;54(1):389-402. doi: 10.1002/jaba.757. Epub 2020 Aug 20.

On the current status of the evolutionary theory of behavior dynamics.论行为动力进化理论的现状。

J Exp Anal Behav. 2019 Jan;111(1):130-145. doi: 10.1002/jeab.495.

Venus Flytrap: How an Excitable, Carnivorous Plant Works.捕蝇草：一种易兴奋的肉食植物是如何工作的。

Trends Plant Sci. 2018 Mar;23(3):220-234. doi: 10.1016/j.tplants.2017.12.004. Epub 2018 Jan 11.

Application of the matching law to pitch selection in professional baseball.

J Appl Behav Anal. 2017 Apr;50(2):393-406. doi: 10.1002/jaba.381. Epub 2017 Mar 9.

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Representation learning: a review and new perspectives.表示学习：综述与新视角。

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验