Suppr超能文献

人类强化学习中对缓慢变化特征的归纳偏差。

An inductive bias for slowly changing features in human reinforcement learning.

作者信息

Hedrich Noa L, Schulz Eric, Hall-McMaster Sam, Schuck Nicolas W

机构信息

Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany.

Institute of Psychology, Universität Hamburg, Hamburg, Germany.

出版信息

PLoS Comput Biol. 2024 Nov 25;20(11):e1012568. doi: 10.1371/journal.pcbi.1012568. eCollection 2024 Nov.

Abstract

Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 295 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Across two experiments and one preregistered replication, participants accrued more reward when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. We did not find a difference in the ability to generalise to unseen feature values between conditions. Testing how feature speed could affect learning with a set of four function approximation Kalman filter models revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.

摘要

在新环境中识别与目标相关的特征是高效行为的核心挑战。我们研究了人类是否通过依赖关于奖励预测特征共同属性的先验知识来应对这一挑战。其中一个属性是特征的变化率,因为与行为相关的过程往往比噪声在更慢的时间尺度上发生变化。因此,我们研究了人类在任务相关特征变化缓慢而非快速时是否更倾向于学习。为了验证这一想法,我们让295名人类参与者在强盗的缓慢或快速变化特征预测奖励时,学习二维强盗问题的奖励。在两个实验和一次预注册的重复实验中,与相反情况相比,当强盗的相关特征缓慢变化而其无关特征快速变化时,参与者获得了更多奖励。我们没有发现不同条件下对未见过的特征值进行泛化的能力存在差异。用一组四个函数逼近卡尔曼滤波器模型测试特征速度如何影响学习,结果显示参与者对缓慢特征的学习率更高,并且会根据特征变化的相关性和速度调整学习。与快速强盗相比,参与者在缓慢强盗上的表现提升越大,他们调整学习率的幅度就越大。这些结果表明,人类强化学习更倾向于变化缓慢的特征,这表明人类在进行奖励学习时存在一种偏向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/0a30d1100b08/pcbi.1012568.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验