Suppr超能文献

主动强化学习与动作偏差和滞后的比较:混合专家与非专家的控制。

Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.

机构信息

Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America.

Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America.

出版信息

PLoS Comput Biol. 2024 Mar 29;20(3):e1011950. doi: 10.1371/journal.pcbi.1011950. eCollection 2024 Mar.

Abstract

Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.

摘要

主动强化学习使动态预测和控制成为可能,其中不仅应最大化奖励,还应最小化推理、决策、行动和时间等成本。对于像人类这样的具身代理,决策也受到行动的物理方面的影响。除了奖励结果对学习过程的影响外,强化学习任务中的行为建模在多大程度上会受到顺序动作选择中其他来源的方差的影响?在动作偏差(对动作本身)和由先前选择的动作历史决定的动作滞后的影响有多大?本研究通过为具有层次结构的任务的顺序选择数据逐步组装模型,来解决这些问题,为学习增加了额外的复杂性。通过对计算模型的系统比较和证伪,对人类选择进行了测试,以检验其是否存在代表不仅增强型广义强化学习,而且还存在动作偏差和滞后的并行模块的特征。我们发现,即使在学习方面,个体差异也存在相当大的偏差和滞后差异。即使在学习方面,个体差异也存在相当大的偏差和滞后差异。即使在学习方面,个体差异也存在相当大的偏差和滞后差异。学习效果不佳的个体表现出最大的偏差,但那些学习准确的个体也存在显著的偏差。滞后的方向在个体之间有所不同,因为从多个先前的动作中会持续出现重复或更常见的交替偏差。考虑到这些动作是具有微不足道的运动需求的按钮按下,因此偏向于动作选择序列的特殊力量足够强大,足以在个体之间以及需要各种动作的任务中普遍存在。鉴于偏差和滞后如何作为一种启发式方法来有效地控制,该方法通过最小化努力成本来适应不确定性或低动力,这些现象拓宽了专家混合的一致理论,以包含行为的专家和非专家控制器的混合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8127/10980507/18ddbedd857e/pcbi.1011950.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验