Suppr超能文献

预测下一个反应:展示将基于人工智能的强化学习与行为科学相结合的效用。

Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.

作者信息

Cox David J, Santos Carlos

机构信息

Institute of Applied Behavioral Science at Endicott College, Beverly, MA USA.

Mosaic Pediatric Therapy, Charlotte, NC USA.

出版信息

Perspect Behav Sci. 2025 Apr 30;48(2):241-267. doi: 10.1007/s40614-025-00444-6. eCollection 2025 Jun.

Abstract

The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms behave as a function of their environment, whereas AI focuses on how artificial agents behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13 predictive approach, termed "human choice modeled by Q-learning," uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.

摘要

强化和惩罚的概念产生于心理学和人工智能(AI)这两个截然不同的科学领域。行为科学家研究生物有机体如何根据其环境表现,而人工智能则关注人工智能体如何行为以最大化奖励或最小化惩罚。本文描述了基于人工智能的强化学习(RL)的广泛特征,它与操作性研究有何不同,以及如何将两者的见解结合起来推动这两个领域的研究。为了证明这种相互效用,为六名参与者构建了12个人工有机体(AO)来预测他们发出的下一个反应。每个AO使用由操作性研究提供的六种特征集组合中的一种,有或没有惩罚错误预测。一种称为“通过Q学习建模的人类选择”的预测方法,使用Q学习机制在每次反应后更新情境 - 反应 - 结果值,并选择下一个反应。这种方法实现了最高的平均预测准确率,为95%(范围为90% - 99%)。次高的准确率平均为89%(范围:85% - 93%),需要分子和摩尔信息以及惩罚偶然性。仅基于摩尔或分子信息且有惩罚偶然性的预测平均准确率为71% - 72%。没有惩罚时,无论特征集如何,预测准确率降至47% - 54%。这项工作强调了基于人工智能的强化学习技术与操作性和反应性领域知识相结合,如何能够提高行为科学家预测生物体行为的能力。这些技术还使研究人员能够解决关于重要主题的理论问题,如行为的多尺度模型以及惩罚在学习中的作用。

相似文献

3
Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略
Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

本文引用的文献

7
Venus Flytrap: How an Excitable, Carnivorous Plant Works.捕蝇草:一种易兴奋的肉食植物是如何工作的。
Trends Plant Sci. 2018 Mar;23(3):220-234. doi: 10.1016/j.tplants.2017.12.004. Epub 2018 Jan 11.
8
Application of the matching law to pitch selection in professional baseball.
J Appl Behav Anal. 2017 Apr;50(2):393-406. doi: 10.1002/jaba.381. Epub 2017 Mar 9.
10
Representation learning: a review and new perspectives.表示学习:综述与新视角。
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验