• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类强化学习中对缓慢变化特征的归纳偏差。

An inductive bias for slowly changing features in human reinforcement learning.

作者信息

Hedrich Noa L, Schulz Eric, Hall-McMaster Sam, Schuck Nicolas W

机构信息

Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany.

Institute of Psychology, Universität Hamburg, Hamburg, Germany.

出版信息

PLoS Comput Biol. 2024 Nov 25;20(11):e1012568. doi: 10.1371/journal.pcbi.1012568. eCollection 2024 Nov.

DOI:10.1371/journal.pcbi.1012568
PMID:39585903
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11637442/
Abstract

Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 295 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Across two experiments and one preregistered replication, participants accrued more reward when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. We did not find a difference in the ability to generalise to unseen feature values between conditions. Testing how feature speed could affect learning with a set of four function approximation Kalman filter models revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.

摘要

在新环境中识别与目标相关的特征是高效行为的核心挑战。我们研究了人类是否通过依赖关于奖励预测特征共同属性的先验知识来应对这一挑战。其中一个属性是特征的变化率,因为与行为相关的过程往往比噪声在更慢的时间尺度上发生变化。因此,我们研究了人类在任务相关特征变化缓慢而非快速时是否更倾向于学习。为了验证这一想法,我们让295名人类参与者在强盗的缓慢或快速变化特征预测奖励时,学习二维强盗问题的奖励。在两个实验和一次预注册的重复实验中,与相反情况相比,当强盗的相关特征缓慢变化而其无关特征快速变化时,参与者获得了更多奖励。我们没有发现不同条件下对未见过的特征值进行泛化的能力存在差异。用一组四个函数逼近卡尔曼滤波器模型测试特征速度如何影响学习,结果显示参与者对缓慢特征的学习率更高,并且会根据特征变化的相关性和速度调整学习。与快速强盗相比,参与者在缓慢强盗上的表现提升越大,他们调整学习率的幅度就越大。这些结果表明,人类强化学习更倾向于变化缓慢的特征,这表明人类在进行奖励学习时存在一种偏向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/9aa6147c3b28/pcbi.1012568.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/0a30d1100b08/pcbi.1012568.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/16001fa2816d/pcbi.1012568.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/d38603945298/pcbi.1012568.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/a56f8567f924/pcbi.1012568.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/90fc732bbfb0/pcbi.1012568.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/cf919048a312/pcbi.1012568.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/9aa6147c3b28/pcbi.1012568.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/0a30d1100b08/pcbi.1012568.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/16001fa2816d/pcbi.1012568.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/d38603945298/pcbi.1012568.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/a56f8567f924/pcbi.1012568.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/90fc732bbfb0/pcbi.1012568.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/cf919048a312/pcbi.1012568.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b707/11637442/9aa6147c3b28/pcbi.1012568.g007.jpg

相似文献

1
An inductive bias for slowly changing features in human reinforcement learning.人类强化学习中对缓慢变化特征的归纳偏差。
PLoS Comput Biol. 2024 Nov 25;20(11):e1012568. doi: 10.1371/journal.pcbi.1012568. eCollection 2024 Nov.
2
Contributions of Attention to Learning in Multidimensional Reward Environments.在多维奖励环境中注意力对学习的贡献。
J Neurosci. 2025 Feb 12;45(7):e2300232024. doi: 10.1523/JNEUROSCI.2300-23.2024.
3
Humans combine value learning and hypothesis testing strategically in multi-dimensional probabilistic reward learning.人类在多维概率奖励学习中策略性地结合价值学习和假设检验。
PLoS Comput Biol. 2022 Nov 23;18(11):e1010699. doi: 10.1371/journal.pcbi.1010699. eCollection 2022 Nov.
4
Momentary subjective well-being depends on learning and not reward.瞬间主观幸福感取决于学习而非奖励。
Elife. 2020 Nov 17;9:e57977. doi: 10.7554/eLife.57977.
5
Multi-task reinforcement learning in humans.人类的多任务强化学习。
Nat Hum Behav. 2021 Jun;5(6):764-773. doi: 10.1038/s41562-020-01035-y. Epub 2021 Jan 28.
6
A Normative Account of Confirmation Bias During Reinforcement Learning.强化学习中确认偏差的规范解释。
Neural Comput. 2022 Jan 14;34(2):307-337. doi: 10.1162/neco_a_01455.
7
Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task.探索特征维度以在无信息强化学习任务中学习新策略。
Sci Rep. 2017 Dec 15;7(1):17676. doi: 10.1038/s41598-017-17687-2.
8
Confirmatory reinforcement learning changes with age during adolescence.确认性强化学习在青少年时期随年龄变化。
Dev Sci. 2023 May;26(3):e13330. doi: 10.1111/desc.13330. Epub 2022 Oct 27.
9
The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort.善、恶与无关因素:学习真实与假设奖励及努力的神经机制
J Neurosci. 2015 Aug 12;35(32):11233-51. doi: 10.1523/JNEUROSCI.0396-15.2015.
10
Finding structure in multi-armed bandits.在多臂老虎机中寻找结构。
Cogn Psychol. 2020 Jun;119:101261. doi: 10.1016/j.cogpsych.2019.101261. Epub 2020 Feb 12.

本文引用的文献

1
Abrupt and spontaneous strategy switches emerge in simple regularised neural networks.简单正则化神经网络中会突然出现自发的策略转换。
PLoS Comput Biol. 2024 Oct 21;20(10):e1012505. doi: 10.1371/journal.pcbi.1012505. eCollection 2024 Oct.
2
Influence of surprise on reinforcement learning in younger and older adults.年轻人和老年人中惊喜对强化学习的影响。
PLoS Comput Biol. 2024 Aug 14;20(8):e1012331. doi: 10.1371/journal.pcbi.1012331. eCollection 2024 Aug.
3
Building integrated representations through interleaved learning.
通过交错学习构建集成表示。
J Exp Psychol Gen. 2023 Sep;152(9):2666-2684. doi: 10.1037/xge0001415. Epub 2023 May 25.
4
Hippocampal spatio-predictive cognitive maps adaptively guide reward generalization.海马体空间预测认知图自适应地引导奖励泛化。
Nat Neurosci. 2023 Apr;26(4):615-626. doi: 10.1038/s41593-023-01283-x. Epub 2023 Apr 3.
5
Slow Down to Go Better: A Survey on Slow Feature Analysis.慢工出细活:慢特征分析调查
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3416-3436. doi: 10.1109/TNNLS.2022.3201621. Epub 2024 Feb 29.
6
Common Neural Mechanisms Control Attention and Working Memory.注意和工作记忆受共同的神经机制控制。
J Neurosci. 2022 Sep 14;42(37):7110-7120. doi: 10.1523/JNEUROSCI.0443-22.2022. Epub 2022 Aug 4.
7
Long-term priors constrain category learning in the context of short-term statistical regularities.长期先验约束了在短期统计规律背景下的类别学习。
Psychon Bull Rev. 2022 Oct;29(5):1925-1937. doi: 10.3758/s13423-022-02114-z. Epub 2022 May 6.
8
Replay in minds and machines.脑海与机器中的重现。
Neurosci Biobehav Rev. 2021 Oct;129:367-388. doi: 10.1016/j.neubiorev.2021.08.002. Epub 2021 Aug 8.
9
Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning.利用缓慢无监督学习在大脑中学习不变物体和空间视图表征。
Front Comput Neurosci. 2021 Jul 21;15:686239. doi: 10.3389/fncom.2021.686239. eCollection 2021.
10
Human Representation Learning.人类表示学习。
Annu Rev Neurosci. 2021 Jul 8;44:253-273. doi: 10.1146/annurev-neuro-092920-120559. Epub 2021 Mar 17.