Guo Dalin, Yu Angela J
Department of Cognitive Science, University of California, San Diego La Jolla, CA 92093 USA.
Department of Cognitive Science & Halıcıoglu Data Science Institute, University of California, San Diego La Jolla, CA 92093 USA.
Cogsci. 2021 Jul;43:2045-2051.
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, multi-armed bandit, has shown humans to exhibit an "uncertainty bonus", which combines with estimated reward to drive exploration. However, previous studies often modeled belief updating using either a Bayesian model that assumed the reward contingency to remain stationary, or a reinforcement learning model. Separately, we previously showed that human learning in the bandit task is best captured by a dynamic-belief Bayesian model. We hypothesize that the estimated uncertainty bonus may depend on which learning model is employed. Here, we re-analyze a bandit dataset using all three learning models. We find that the dynamic-belief model captures human choice behavior best, while also uncovering a much larger uncertainty bonus than the other models. More broadly, our results also emphasize the importance of an appropriate learning model, as it is crucial for correctly characterizing the processes underlying human decision making.
人类常常面临探索与利用之间的权衡。一种常用的范式——多臂老虎机,已表明人类会表现出一种“不确定性奖励”,它与估计的奖励相结合以驱动探索。然而,先前的研究通常使用假设奖励偶然性保持不变的贝叶斯模型或强化学习模型来对信念更新进行建模。另外,我们之前表明,动态信念贝叶斯模型最能体现人类在老虎机任务中的学习情况。我们假设估计的不确定性奖励可能取决于所采用的学习模型。在此,我们使用所有三种学习模型重新分析了一个老虎机数据集。我们发现动态信念模型最能捕捉人类的选择行为,同时还发现其不确定性奖励比其他模型大得多。更广泛地说,我们的结果还强调了合适的学习模型的重要性,因为它对于正确描述人类决策背后的过程至关重要。