• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类序贯决策中的结构学习。

Structure learning in human sequential decision-making.

机构信息

Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America.

出版信息

PLoS Comput Biol. 2010 Dec 2;6(12):e1001003. doi: 10.1371/journal.pcbi.1001003.

DOI:10.1371/journal.pcbi.1001003
PMID:21151963
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2996460/
Abstract

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.

摘要

人类的序贯决策研究经常发现,相对于具有对环境中奖励和事件生成模型的完美知识的理想行为者,人类的表现并不理想。我们认为,人类面临的学习问题更加复杂,因为它还涉及到学习环境中奖励生成的结构。我们使用贝叶斯强化学习来制定序贯决策任务中的结构学习问题,并表明学习奖励生成的生成模型从本质上改变了最优学习代理的行为。为了测试人们是否表现出结构学习,我们进行了涉及单臂和双臂强盗奖励模型混合的实验,其中结构学习产生了先前研究中被认为是次优的许多定性行为。我们的结果表明,人类可以近乎最优地进行结构学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/2a0386a9d28d/pcbi.1001003.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/e2812975d78e/pcbi.1001003.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/c61da37bcd33/pcbi.1001003.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/c758649fe18b/pcbi.1001003.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/db64956d40ce/pcbi.1001003.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/ae46b6603fe8/pcbi.1001003.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/1ff159c20cca/pcbi.1001003.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/2a0386a9d28d/pcbi.1001003.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/e2812975d78e/pcbi.1001003.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/c61da37bcd33/pcbi.1001003.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/c758649fe18b/pcbi.1001003.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/db64956d40ce/pcbi.1001003.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/ae46b6603fe8/pcbi.1001003.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/1ff159c20cca/pcbi.1001003.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1070/2996460/2a0386a9d28d/pcbi.1001003.g007.jpg

相似文献

1
Structure learning in human sequential decision-making.人类序贯决策中的结构学习。
PLoS Comput Biol. 2010 Dec 2;6(12):e1001003. doi: 10.1371/journal.pcbi.1001003.
2
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中,人类纹状体中的强化学习信号可区分学习者和非学习者。
J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.
3
Mouse tracking reveals structure knowledge in the absence of model-based choice.鼠标追踪揭示了在没有基于模型的选择的情况下的结构知识。
Nat Commun. 2020 Apr 20;11(1):1893. doi: 10.1038/s41467-020-15696-w.
4
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
5
One-shot learning and behavioral eligibility traces in sequential decision making.序列决策中的单次学习和行为资格痕迹。
Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.
6
Similarities and differences in spatial and non-spatial cognitive maps.空间认知图和非空间认知图的异同。
PLoS Comput Biol. 2020 Sep 9;16(9):e1008149. doi: 10.1371/journal.pcbi.1008149. eCollection 2020 Sep.
7
[Mathematical models of decision making and learning].[决策与学习的数学模型]
Brain Nerve. 2008 Jul;60(7):791-8.
8
Credit assignment in movement-dependent reinforcement learning.运动依赖型强化学习中的信用分配
Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6797-802. doi: 10.1073/pnas.1523669113. Epub 2016 May 31.
9
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
10
Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents.适度的确认偏差会增强强化学习智能体群体中的决策能力。
PLoS Comput Biol. 2024 Sep 4;20(9):e1012404. doi: 10.1371/journal.pcbi.1012404. eCollection 2024 Sep.

引用本文的文献

1
Identifying Transfer Learning in the Reshaping of Inductive Biases.在归纳偏差重塑中识别迁移学习。
Open Mind (Camb). 2024 Sep 15;8:1107-1128. doi: 10.1162/opmi_a_00158. eCollection 2024.
2
Overharvesting in human patch foraging reflects rational structure learning and adaptive planning.人类斑块觅食中的过度捕捞反映了理性的结构学习和适应性规划。
Proc Natl Acad Sci U S A. 2023 Mar 28;120(13):e2216524120. doi: 10.1073/pnas.2216524120. Epub 2023 Mar 24.
3
Model Sharing in the Human Medial Temporal Lobe.人类内侧颞叶中的模型共享。

本文引用的文献

1
Sequential effects: Superstition or rational behavior?序列效应:迷信还是理性行为?
Adv Neural Inf Process Syst. 2008;21:1873-1880.
2
A hierarchical bayesian model of human decision-making on an optimal stopping problem.人类在最优停止问题上决策的分层贝叶斯模型。
Cogn Sci. 2006 May 6;30(3):1-26. doi: 10.1207/s15516709cog0000_69.
3
Learning latent structure: carving nature at its joints.学习潜在结构:在关节处雕刻自然。
J Neurosci. 2022 Jul 6;42(27):5410-5426. doi: 10.1523/JNEUROSCI.1978-21.2022. Epub 2022 May 23.
4
A Bayesian Account of Generalist and Specialist Formation Under the Active Inference Framework.主动推理框架下通才和专才形成的贝叶斯解释。
Front Artif Intell. 2020 Sep 3;3:69. doi: 10.3389/frai.2020.00069. eCollection 2020.
5
Retrospective Inference as a Form of Bounded Rationality, and Its Beneficial Influence on Learning.作为有限理性形式的追溯性推理及其对学习的有益影响。
Front Artif Intell. 2020 Feb 18;3:2. doi: 10.3389/frai.2020.00002. eCollection 2020.
6
Making the Environment an Informative Place: A Conceptual Analysis of Epistemic Policies and Sensorimotor Coordination.让环境成为信息丰富之地:认知政策与感觉运动协调的概念分析
Entropy (Basel). 2019 Mar 30;21(4):350. doi: 10.3390/e21040350.
7
Optimal Query Selection Using Multi-Armed Bandits.使用多臂老虎机进行最优查询选择
IEEE Signal Process Lett. 2018 Dec;25(12):1870-1874. doi: 10.1109/LSP.2018.2878066. Epub 2018 Oct 26.
8
Models that learn how humans learn: The case of decision-making and its disorders.学习人类如何学习的模型:以决策及其障碍为例。
PLoS Comput Biol. 2019 Jun 11;15(6):e1006903. doi: 10.1371/journal.pcbi.1006903. eCollection 2019 Jun.
9
Deconstructing the human algorithms for exploration.解构人类的探索算法。
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
10
A unifying Bayesian account of contextual effects in value-based choice.基于价值选择中情境效应的统一贝叶斯解释。
PLoS Comput Biol. 2017 Oct 5;13(10):e1005769. doi: 10.1371/journal.pcbi.1005769. eCollection 2017 Oct.
Curr Opin Neurobiol. 2010 Apr;20(2):251-6. doi: 10.1016/j.conb.2010.02.008. Epub 2010 Mar 11.
4
Structure learning in action.行动中的结构学习。
Behav Brain Res. 2010 Jan 20;206(2):157-65. doi: 10.1016/j.bbr.2009.08.031. Epub 2009 Aug 29.
5
When does reward maximization lead to matching law?奖励最大化何时会导致匹配法则?
PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.
6
Integrating hippocampus and striatum in decision-making.在决策过程中整合海马体和纹状体
Curr Opin Neurobiol. 2007 Dec;17(6):692-7. doi: 10.1016/j.conb.2008.01.003. Epub 2008 Mar 4.
7
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
8
Learning the value of information in an uncertain world.在一个不确定的世界中了解信息的价值。
Nat Neurosci. 2007 Sep;10(9):1214-21. doi: 10.1038/nn1954. Epub 2007 Aug 5.
9
Theory-based Bayesian models of inductive learning and reasoning.基于理论的归纳学习与推理贝叶斯模型。
Trends Cogn Sci. 2006 Jul;10(7):309-18. doi: 10.1016/j.tics.2006.05.009. Epub 2006 Jun 22.
10
Cortical substrates for exploratory decisions in humans.人类探索性决策的皮质基础。
Nature. 2006 Jun 15;441(7095):876-9. doi: 10.1038/nature04766.