University of New South Wales, Sydney, Australia.
PLoS Comput Biol. 2011 Jan 20;7(1):e1001048. doi: 10.1371/journal.pcbi.1001048.
Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating.
最近的证据表明,在六臂不安定赌博问题中,人类在学习时使用贝叶斯更新,而不是(无模型)强化算法。在这里,我们研究这对人类对不确定性的理解意味着什么。在我们的任务中,贝叶斯学习者区分三种同样明显的不确定性水平。首先,贝叶斯感知到不可减少的不确定性或风险:即使知道给定手臂的收益概率,结果仍然是不确定的。其次,存在(参数)估计不确定性或模糊性:收益概率未知,需要进行估计。第三,手臂的结果概率发生变化:突然的跳跃被称为意外不确定性。我们记录了在实验过程中这三个不确定性水平是如何演变的,以及它如何影响学习速度。然后,我们将重点放在估计不确定性上,尽管有证据表明人们普遍厌恶模糊性,但它被认为是探索的驱动力。我们的数据证实了后者。我们讨论了预示人类能够区分三种不确定性水平的神经证据。最后,我们研究了人类实施贝叶斯学习的能力的界限。我们用不同的指令重复实验,反映了不同程度的结构不确定性。在这种第四种不确定性概念下,选择不能用贝叶斯更新来更好地解释,而只能用(无模型)强化学习来解释。退出问卷显示,参与者仍然没有意识到意外不确定性的存在,并且未能获得正确的模型来实施贝叶斯更新。