Suppr超能文献

重新审视不确定性驱动的探索在(感知到的)非平稳世界中的作用。

Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World.

作者信息

Guo Dalin, Yu Angela J

机构信息

Department of Cognitive Science, University of California, San Diego La Jolla, CA 92093 USA.

Department of Cognitive Science & Halıcıoglu Data Science Institute, University of California, San Diego La Jolla, CA 92093 USA.

出版信息

Cogsci. 2021 Jul;43:2045-2051.

Abstract

Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, multi-armed bandit, has shown humans to exhibit an "uncertainty bonus", which combines with estimated reward to drive exploration. However, previous studies often modeled belief updating using either a Bayesian model that assumed the reward contingency to remain stationary, or a reinforcement learning model. Separately, we previously showed that human learning in the bandit task is best captured by a dynamic-belief Bayesian model. We hypothesize that the estimated uncertainty bonus may depend on which learning model is employed. Here, we re-analyze a bandit dataset using all three learning models. We find that the dynamic-belief model captures human choice behavior best, while also uncovering a much larger uncertainty bonus than the other models. More broadly, our results also emphasize the importance of an appropriate learning model, as it is crucial for correctly characterizing the processes underlying human decision making.

摘要

人类常常面临探索与利用之间的权衡。一种常用的范式——多臂老虎机,已表明人类会表现出一种“不确定性奖励”,它与估计的奖励相结合以驱动探索。然而,先前的研究通常使用假设奖励偶然性保持不变的贝叶斯模型或强化学习模型来对信念更新进行建模。另外,我们之前表明,动态信念贝叶斯模型最能体现人类在老虎机任务中的学习情况。我们假设估计的不确定性奖励可能取决于所采用的学习模型。在此,我们使用所有三种学习模型重新分析了一个老虎机数据集。我们发现动态信念模型最能捕捉人类的选择行为,同时还发现其不确定性奖励比其他模型大得多。更广泛地说,我们的结果还强调了合适的学习模型的重要性,因为它对于正确描述人类决策背后的过程至关重要。

相似文献

2
Finding structure in multi-armed bandits.在多臂老虎机中寻找结构。
Cogn Psychol. 2020 Jun;119:101261. doi: 10.1016/j.cogpsych.2019.101261. Epub 2020 Feb 12.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
5
Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索
Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.

本文引用的文献

3
Balancing exploration and exploitation with information and randomization.通过信息与随机化实现探索与利用的平衡。
Curr Opin Behav Sci. 2021 Apr;38:49-56. doi: 10.1016/j.cobeha.2020.10.001. Epub 2020 Nov 6.
6
Deconstructing the human algorithms for exploration.解构人类的探索算法。
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
9
Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索
Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验