Cox David J, Santos Carlos
Institute of Applied Behavioral Science at Endicott College, Beverly, MA USA.
Mosaic Pediatric Therapy, Charlotte, NC USA.
Perspect Behav Sci. 2025 Apr 30;48(2):241-267. doi: 10.1007/s40614-025-00444-6. eCollection 2025 Jun.
The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms behave as a function of their environment, whereas AI focuses on how artificial agents behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13 predictive approach, termed "human choice modeled by Q-learning," uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.
强化和惩罚的概念产生于心理学和人工智能(AI)这两个截然不同的科学领域。行为科学家研究生物有机体如何根据其环境表现,而人工智能则关注人工智能体如何行为以最大化奖励或最小化惩罚。本文描述了基于人工智能的强化学习(RL)的广泛特征,它与操作性研究有何不同,以及如何将两者的见解结合起来推动这两个领域的研究。为了证明这种相互效用,为六名参与者构建了12个人工有机体(AO)来预测他们发出的下一个反应。每个AO使用由操作性研究提供的六种特征集组合中的一种,有或没有惩罚错误预测。一种称为“通过Q学习建模的人类选择”的预测方法,使用Q学习机制在每次反应后更新情境 - 反应 - 结果值,并选择下一个反应。这种方法实现了最高的平均预测准确率,为95%(范围为90% - 99%)。次高的准确率平均为89%(范围:85% - 93%),需要分子和摩尔信息以及惩罚偶然性。仅基于摩尔或分子信息且有惩罚偶然性的预测平均准确率为71% - 72%。没有惩罚时,无论特征集如何,预测准确率降至47% - 54%。这项工作强调了基于人工智能的强化学习技术与操作性和反应性领域知识相结合,如何能够提高行为科学家预测生物体行为的能力。这些技术还使研究人员能够解决关于重要主题的理论问题,如行为的多尺度模型以及惩罚在学习中的作用。