• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重新审视不确定性驱动的探索在(感知到的)非平稳世界中的作用。

Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World.

作者信息

Guo Dalin, Yu Angela J

机构信息

Department of Cognitive Science, University of California, San Diego La Jolla, CA 92093 USA.

Department of Cognitive Science & Halıcıoglu Data Science Institute, University of California, San Diego La Jolla, CA 92093 USA.

出版信息

Cogsci. 2021 Jul;43:2045-2051.

PMID:34368809
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8341546/
Abstract

Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, multi-armed bandit, has shown humans to exhibit an "uncertainty bonus", which combines with estimated reward to drive exploration. However, previous studies often modeled belief updating using either a Bayesian model that assumed the reward contingency to remain stationary, or a reinforcement learning model. Separately, we previously showed that human learning in the bandit task is best captured by a dynamic-belief Bayesian model. We hypothesize that the estimated uncertainty bonus may depend on which learning model is employed. Here, we re-analyze a bandit dataset using all three learning models. We find that the dynamic-belief model captures human choice behavior best, while also uncovering a much larger uncertainty bonus than the other models. More broadly, our results also emphasize the importance of an appropriate learning model, as it is crucial for correctly characterizing the processes underlying human decision making.

摘要

人类常常面临探索与利用之间的权衡。一种常用的范式——多臂老虎机,已表明人类会表现出一种“不确定性奖励”,它与估计的奖励相结合以驱动探索。然而,先前的研究通常使用假设奖励偶然性保持不变的贝叶斯模型或强化学习模型来对信念更新进行建模。另外,我们之前表明,动态信念贝叶斯模型最能体现人类在老虎机任务中的学习情况。我们假设估计的不确定性奖励可能取决于所采用的学习模型。在此,我们使用所有三种学习模型重新分析了一个老虎机数据集。我们发现动态信念模型最能捕捉人类的选择行为,同时还发现其不确定性奖励比其他模型大得多。更广泛地说,我们的结果还强调了合适的学习模型的重要性,因为它对于正确描述人类决策背后的过程至关重要。

相似文献

1
Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World.重新审视不确定性驱动的探索在(感知到的)非平稳世界中的作用。
Cogsci. 2021 Jul;43:2045-2051.
2
Finding structure in multi-armed bandits.在多臂老虎机中寻找结构。
Cogn Psychol. 2020 Jun;119:101261. doi: 10.1016/j.cogpsych.2019.101261. Epub 2020 Feb 12.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
4
Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task.甲基苯丙胺成瘾中统计学习与决策的改变:来自双臂赌博任务的证据
Front Psychol. 2015 Dec 18;6:1910. doi: 10.3389/fpsyg.2015.01910. eCollection 2015.
5
Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索
Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.
6
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
7
Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.未被选择选项的贬值:对过度乐观期望的起源与维持的贝叶斯解释。
Cogsci. 2020 Jul-Aug;42:1682-1688.
8
Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making.多巴胺对人类决策中探索/利用权衡的调节作用。
Elife. 2020 Jun 2;9:e51260. doi: 10.7554/eLife.51260.
9
Dopamine blockade impairs the exploration-exploitation trade-off in rats.多巴胺阻断会损害大鼠的探索-利用权衡。
Sci Rep. 2019 May 1;9(1):6770. doi: 10.1038/s41598-019-43245-z.
10
Regulation of reinforcement learning parameters captures long-term changes in rat behaviour.强化学习参数的调节可以捕捉到大鼠行为的长期变化。
Eur J Neurosci. 2024 Aug;60(4):4469-4490. doi: 10.1111/ejn.16449. Epub 2024 Jun 24.

引用本文的文献

1
Assessing social anhedonia in a transdiagnostic sample: Insights from a computational psychiatry lens.在跨诊断样本中评估社交快感缺失:来自计算精神病学视角的见解。
J Mood Anxiety Disord. 2024 Sep 17;8:100088. doi: 10.1016/j.xjmad.2024.100088. eCollection 2024 Dec.
2
Multiple and subject-specific roles of uncertainty in reward-guided decision-making.不确定性在奖励引导决策中的多种特定主体作用。
bioRxiv. 2024 Sep 12:2024.03.27.587016. doi: 10.1101/2024.03.27.587016.

本文引用的文献

1
Demystifying excessively volatile human learning: A Bayesian persistent prior and a neural approximation.揭开人类学习过度波动之谜:贝叶斯持久先验与神经近似法
Adv Neural Inf Process Syst. 2018 Dec;31:2781-2790.
2
Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.未被选择选项的贬值:对过度乐观期望的起源与维持的贝叶斯解释。
Cogsci. 2020 Jul-Aug;42:1682-1688.
3
Balancing exploration and exploitation with information and randomization.通过信息与随机化实现探索与利用的平衡。
Curr Opin Behav Sci. 2021 Apr;38:49-56. doi: 10.1016/j.cobeha.2020.10.001. Epub 2020 Nov 6.
4
Area-Specificity and Plasticity of History-Dependent Value Coding During Learning.学习过程中基于历史的价值编码的区域特异性和可塑性。
Cell. 2019 Jun 13;177(7):1858-1872.e15. doi: 10.1016/j.cell.2019.04.027. Epub 2019 May 9.
5
Dopamine blockade impairs the exploration-exploitation trade-off in rats.多巴胺阻断会损害大鼠的探索-利用权衡。
Sci Rep. 2019 May 1;9(1):6770. doi: 10.1038/s41598-019-43245-z.
6
Deconstructing the human algorithms for exploration.解构人类的探索算法。
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
7
Learning the value of information and reward over time when solving exploration-exploitation problems.随着时间的推移,在解决探索-开发问题时学习信息和奖励的价值。
Sci Rep. 2017 Dec 5;7(1):16919. doi: 10.1038/s41598-017-17237-w.
8
Sequential effects: Superstition or rational behavior?序列效应:迷信还是理性行为?
Adv Neural Inf Process Syst. 2008;21:1873-1880.
9
Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索
Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.
10
Humans use directed and random exploration to solve the explore-exploit dilemma.人类利用有向探索和随机探索来解决探索与利用的两难困境。
J Exp Psychol Gen. 2014 Dec;143(6):2074-81. doi: 10.1037/a0038199. Epub 2014 Oct 27.