MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.
Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom.
PLoS One. 2022 Sep 12;17(9):e0274272. doi: 10.1371/journal.pone.0274272. eCollection 2022.
When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.
在比较多臂赌博机算法的性能时,通常会忽略缺失数据的潜在影响。在实践中,它也会影响这些算法的实现,其中最简单的方法是继续按照原始的赌博机算法进行采样,忽略缺失的结果。我们通过广泛的模拟研究,调查了这种方法对处理缺失数据的影响,这些研究假设奖励是随机缺失的。我们关注的是在临床试验中具有二进制结果的两种臂赌博机算法,这些临床试验的样本量相对较小。然而,我们的结果适用于其他应用中预期会出现缺失数据的赌博机算法。我们评估了由此产生的操作特征,包括预期奖励。考虑了两种臂中不同的缺失概率。我们工作的关键发现是,当使用最简单的忽略缺失数据的策略时,缺失数据对多臂赌博机策略的预期性能的影响取决于这些策略平衡探索-利用权衡的方式。倾向于探索的算法继续将样本分配给缺失响应较多的臂(由于被认为是具有较少观察信息的臂,因此比其他情况下更受算法欢迎)。相比之下,倾向于利用的算法会根据当前均值较高的臂中的样本快速分配高值,而不考虑每个臂的观察水平。此外,对于更侧重于探索的算法,我们说明可以使用简单的均值插补方法来缓解缺失响应的问题。