Truong Lan V, Scarlett Jonathan
Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.
Department of Computer Science & Department of Mathematics, National University of Singapore, Singapore 117418, Singapore.
Entropy (Basel). 2020 Jul 20;22(7):788. doi: 10.3390/e22070788.
In this paper, we consider techniques for establishing lower bounds on the number of arm pulls for best-arm identification in the multi-armed bandit problem. While a recent divergence-based approach was shown to provide improvements over an older gap-based approach, we show that the latter can be refined to match the former (up to constant factors) in many cases of interest under Bernoulli rewards, including the case that the rewards are bounded away from zero and one. Together with existing upper bounds, this indicates that the divergence-based and gap-based approaches are both effective for establishing sample complexity lower bounds for best-arm identification.
在本文中,我们考虑在多臂老虎机问题中确定最佳臂识别所需拉臂次数下限的技术。虽然最近基于散度的方法相较于旧的基于差距的方法有改进,但我们表明,在伯努利奖励下的许多感兴趣的情况下,包括奖励远离零和一的情况,后者可以得到改进以与前者匹配(相差常数因子)。结合现有的上限,这表明基于散度的方法和基于差距的方法对于确定最佳臂识别的样本复杂度下限都是有效的。