University of Tokyo, Bunkyo-ku, Tokyo, 113-0333, Japan, and RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
Neural Comput. 2020 Sep;32(9):1733-1773. doi: 10.1162/neco_a_01299. Epub 2020 Jul 20.
We study the problem of stochastic multiple-arm identification, where an agent sequentially explores a size- subset of arms (also known as a ) from given arms and tries to identify the best super arm. Most work so far has considered the semi-bandit setting, where the agent can observe the reward of each pulled arm or assumed each arm can be queried at each round. However, in real-world applications, it is costly or sometimes impossible to observe a reward of individual arms. In this study, we tackle the full-bandit setting, where only a noisy observation of the total sum of a super arm is given at each pull. Although our problem can be regarded as an instance of the best arm identification in linear bandits, a naive approach based on linear bandits is computationally infeasible since the number of super arms is exponential. To cope with this problem, we first design a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization. Based on our approximation algorithm, we propose a bandit algorithm whose computation time is (log ), thereby achieving an exponential speedup over linear bandit algorithms. We provide a sample complexity upper bound that is still worst-case optimal. Finally, we conduct experiments on large-scale data sets with more than 10 super arms, demonstrating the superiority of our algorithms in terms of both the computation time and the sample complexity.
我们研究了随机多臂识别问题,其中代理者从给定的 个臂中顺序探索一个大小为 的子集(也称为 ),并试图识别最佳超级臂。到目前为止,大多数工作都考虑了半臂问题设置,其中代理者可以观察每个拉出的臂的奖励,或者假设每个臂在每一轮都可以被查询。然而,在实际应用中,观察单个臂的奖励是昂贵的,有时甚至是不可能的。在这项研究中,我们解决了全臂问题设置,其中在每次抽取时只给出超级臂的总和的一个噪声观测值。虽然我们的问题可以被视为线性带臂中的最佳臂识别的一个实例,但基于线性带臂的简单方法在计算上是不可行的,因为超级臂的数量是指数级的。为了解决这个问题,我们首先设计了一种用于置信椭球最大化中出现的 0-1 二次规划问题的多项式时间逼近算法。基于我们的逼近算法,我们提出了一种带臂算法,其计算时间为 (log ),从而比线性带臂算法实现了指数级的加速。我们提供了一个样本复杂度上界,仍然是最坏情况下最优的。最后,我们在具有超过 10 个超级臂的大规模数据集上进行了实验,证明了我们的算法在计算时间和样本复杂度方面的优越性。