Department of Psychology and Center for Brain Science, Harvard University, United States.
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
The dilemma between information gathering (exploration) and reward seeking (exploitation) is a fundamental problem for reinforcement learning agents. How humans resolve this dilemma is still an open question, because experiments have provided equivocal evidence about the underlying algorithms used by humans. We show that two families of algorithms can be distinguished in terms of how uncertainty affects exploration. Algorithms based on uncertainty bonuses predict a change in response bias as a function of uncertainty, whereas algorithms based on sampling predict a change in response slope. Two experiments provide evidence for both bias and slope changes, and computational modeling confirms that a hybrid model is the best quantitative account of the data.
在信息收集(探索)和奖励寻求(利用)之间的困境是强化学习代理的一个基本问题。人类如何解决这个困境仍然是一个悬而未决的问题,因为实验提供了关于人类使用的潜在算法的模棱两可的证据。我们表明,可以根据不确定性如何影响探索来区分两类算法。基于不确定性奖金的算法预测响应偏差的变化作为不确定性的函数,而基于采样的算法预测响应斜率的变化。两项实验为偏差和斜率变化都提供了证据,计算模型证实混合模型是对数据的最佳定量描述。