• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类主要在两阶段任务中使用基于模型的推理。

Humans primarily use model-based inference in the two-stage task.

机构信息

Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland.

出版信息

Nat Hum Behav. 2020 Oct;4(10):1053-1066. doi: 10.1038/s41562-020-0905-y. Epub 2020 Jul 6.

DOI:10.1038/s41562-020-0905-y
PMID:32632333
Abstract

Distinct model-free and model-based learning processes are thought to drive both typical and dysfunctional behaviours. Data from two-stage decision tasks have seemingly shown that human behaviour is driven by both processes operating in parallel. However, in this study, we show that more detailed task instructions lead participants to make primarily model-based choices that have little, if any, simple model-free influence. We also demonstrate that behaviour in the two-stage task may falsely appear to be driven by a combination of simple model-free and model-based learning if purely model-based agents form inaccurate models of the task because of misconceptions. Furthermore, we report evidence that many participants do misconceive the task in important ways. Overall, we argue that humans formulate a wide variety of learning models. Consequently, the simple dichotomy of model-free versus model-based learning is inadequate to explain behaviour in the two-stage task and connections between reward learning, habit formation and compulsivity.

摘要

人们认为,独特的无模型和基于模型的学习过程驱动着典型和功能失调的行为。来自两阶段决策任务的数据似乎表明,人类行为是由这两个过程并行驱动的。然而,在这项研究中,我们表明,更详细的任务说明会导致参与者主要做出基于模型的选择,这些选择几乎没有(如果有的话)简单的无模型影响。我们还证明,如果由于误解,纯粹基于模型的代理对任务形成不准确的模型,那么两阶段任务中的行为可能会错误地看起来是由简单的无模型和基于模型的学习的组合驱动的。此外,我们报告的证据表明,许多参与者确实以重要的方式误解了任务。总的来说,我们认为人类会形成各种各样的学习模型。因此,简单的无模型与基于模型的学习二分法不足以解释两阶段任务以及奖励学习、习惯形成和强迫之间的联系。

相似文献

1
Humans primarily use model-based inference in the two-stage task.人类主要在两阶段任务中使用基于模型的推理。
Nat Hum Behav. 2020 Oct;4(10):1053-1066. doi: 10.1038/s41562-020-0905-y. Epub 2020 Jul 6.
2
Learning reward frequency over reward probability: A tale of two learning rules.学习奖励频率优于奖励概率:两种学习规则的故事。
Cognition. 2019 Dec;193:104042. doi: 10.1016/j.cognition.2019.104042. Epub 2019 Aug 17.
3
Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task.简单计划还是复杂习惯?两步任务中的状态、转换与学习交互
PLoS Comput Biol. 2015 Dec 11;11(12):e1004648. doi: 10.1371/journal.pcbi.1004648. eCollection 2015 Dec.
4
Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.强化学习过程中刺激处理的目标导向与习惯样调制
J Neurosci. 2017 Mar 15;37(11):3009-3017. doi: 10.1523/JNEUROSCI.3205-16.2017. Epub 2017 Feb 13.
5
Stress enhances model-free reinforcement learning only after negative outcome.压力仅在负面结果后增强无模型强化学习。
PLoS One. 2017 Jul 19;12(7):e0180588. doi: 10.1371/journal.pone.0180588. eCollection 2017.
6
Information about action outcomes differentially affects learning from self-determined versus imposed choices.关于行动结果的信息会对自主选择和强制选择的学习产生不同的影响。
Nat Hum Behav. 2020 Oct;4(10):1067-1079. doi: 10.1038/s41562-020-0919-5. Epub 2020 Aug 3.
7
Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.策略使用的异质性在爱荷华赌博任务中:赢留输变和强化学习模型的比较。
Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.
8
EEG correlates of physical effort and reward processing during reinforcement learning.脑电图对强化学习过程中体力消耗和奖励处理的相关性研究。
J Neurophysiol. 2020 Aug 1;124(2):610-622. doi: 10.1152/jn.00370.2020. Epub 2020 Jul 29.
9
Hold it! The influence of lingering rewards on choice diversification and persistence.等等!延迟奖励对选择多样化和坚持性的影响。
J Exp Psychol Learn Mem Cogn. 2017 Nov;43(11):1752-1767. doi: 10.1037/xlm0000407. Epub 2017 Apr 6.
10
A note on the analysis of two-stage task results: How changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability.关于两阶段任务结果分析的说明:任务结构的变化如何影响无模型和基于模型的策略对奖励和转换对停留概率的影响的预测。
PLoS One. 2018 Apr 3;13(4):e0195328. doi: 10.1371/journal.pone.0195328. eCollection 2018.

引用本文的文献

1
A foundation model to predict and capture human cognition.一种用于预测和捕捉人类认知的基础模型。
Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09215-4.
2
Discovering cognitive strategies with tiny recurrent neural networks.使用微型递归神经网络发现认知策略。
Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09142-4.
3
Model-based algorithms shape automatic evaluative processing.基于模型的算法塑造自动评价性加工。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2417068122. doi: 10.1073/pnas.2417068122. Epub 2025 Jun 20.
4
Imbalanced goal-directed and habitual control in individuals with internet gaming disorder.患有网络游戏障碍的个体中目标导向控制和习惯控制失衡。
J Behav Addict. 2025 Apr 28;14(2):831-845. doi: 10.1556/2006.2025.00037. Print 2025 Jul 2.
5
Noradrenergic and Dopaminergic modulation of meta-cognition and meta-control.去甲肾上腺素能和多巴胺能对元认知和元控制的调节。
PLoS Comput Biol. 2025 Feb 26;21(2):e1012675. doi: 10.1371/journal.pcbi.1012675. eCollection 2025 Feb.
6
Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.两步序贯决策任务行为中持续重复和基于启发式的定向探索特征
Comput Psychiatr. 2025 Feb 11;9(1):39-62. doi: 10.5334/cpsy.101. eCollection 2025.
7
Maturation of striatal dopamine supports the development of habitual behavior through adolescence.纹状体多巴胺的成熟通过青春期支持习惯性行为的发展。
bioRxiv. 2025 Jan 6:2025.01.06.631527. doi: 10.1101/2025.01.06.631527.
8
Impact of provoked stress on model-free and model-based reinforcement learning in individuals with alcohol use disorder.诱发应激对酒精使用障碍个体中无模型和基于模型的强化学习的影响。
Addict Behav Rep. 2024 Nov 23;20:100574. doi: 10.1016/j.abrep.2024.100574. eCollection 2024 Dec.
9
Understanding dual process cognition via the minimum description length principle.通过最小描述长度原理理解双过程认知。
PLoS Comput Biol. 2024 Oct 18;20(10):e1012383. doi: 10.1371/journal.pcbi.1012383. eCollection 2024 Oct.
10
Memory for rewards guides retrieval.对奖励的记忆引导检索。
Commun Psychol. 2024 Apr 16;2(1):31. doi: 10.1038/s44271-024-00074-9.

本文引用的文献

1
Model-Free RL or Action Sequences?无模型强化学习还是动作序列?
Front Psychol. 2019 Dec 20;10:2892. doi: 10.3389/fpsyg.2019.02892. eCollection 2019.
2
Credit assignment to state-independent task representations and its relationship with model-based decision making.状态独立任务表示的信用分配及其与基于模型的决策的关系。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15871-15876. doi: 10.1073/pnas.1821647116. Epub 2019 Jul 18.
3
The successor representation in human reinforcement learning.人类强化学习中的后继表示
Nat Hum Behav. 2017 Sep;1(9):680-692. doi: 10.1038/s41562-017-0180-8. Epub 2017 Aug 28.
4
Holistic Reinforcement Learning: The Role of Structure and Attention.整体强化学习:结构与注意力的作用。
Trends Cogn Sci. 2019 Apr;23(4):278-292. doi: 10.1016/j.tics.2019.01.010. Epub 2019 Feb 26.
5
Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling.使用反应时和漂移扩散建模改进两阶段决策任务中基于模型的决策估计的可靠性。
PLoS Comput Biol. 2019 Feb 13;15(2):e1006803. doi: 10.1371/journal.pcbi.1006803. eCollection 2019 Feb.
6
Habits without values.无价值观的习惯。
Psychol Rev. 2019 Mar;126(2):292-311. doi: 10.1037/rev0000120. Epub 2019 Jan 24.
7
Are we of two minds?我们是否三心二意?
Nat Neurosci. 2018 Nov;21(11):1497-1499. doi: 10.1038/s41593-018-0258-2.
8
Planning Complexity Registers as a Cost in Metacontrol.计划复杂性作为元控制的成本进行登记。
J Cogn Neurosci. 2018 Oct;30(10):1391-1404. doi: 10.1162/jocn_a_01263. Epub 2018 Apr 18.
9
Dorsal hippocampus contributes to model-based planning.背侧海马体有助于基于模型的规划。
Nat Neurosci. 2017 Sep;20(9):1269-1276. doi: 10.1038/nn.4613. Epub 2017 Jul 31.
10
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.多强化学习系统的成本效益仲裁。
Psychol Sci. 2017 Sep;28(9):1321-1333. doi: 10.1177/0956797617708288. Epub 2017 Jul 21.