• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多项式时间算法,用于具有全带反馈的多臂识别。

Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback.

机构信息

University of Tokyo, Bunkyo-ku, Tokyo, 113-0333, Japan, and RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan

RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan

出版信息

Neural Comput. 2020 Sep;32(9):1733-1773. doi: 10.1162/neco_a_01299. Epub 2020 Jul 20.

DOI:10.1162/neco_a_01299
PMID:32687769
Abstract

We study the problem of stochastic multiple-arm identification, where an agent sequentially explores a size- subset of arms (also known as a ) from given arms and tries to identify the best super arm. Most work so far has considered the semi-bandit setting, where the agent can observe the reward of each pulled arm or assumed each arm can be queried at each round. However, in real-world applications, it is costly or sometimes impossible to observe a reward of individual arms. In this study, we tackle the full-bandit setting, where only a noisy observation of the total sum of a super arm is given at each pull. Although our problem can be regarded as an instance of the best arm identification in linear bandits, a naive approach based on linear bandits is computationally infeasible since the number of super arms is exponential. To cope with this problem, we first design a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization. Based on our approximation algorithm, we propose a bandit algorithm whose computation time is (log ), thereby achieving an exponential speedup over linear bandit algorithms. We provide a sample complexity upper bound that is still worst-case optimal. Finally, we conduct experiments on large-scale data sets with more than 10 super arms, demonstrating the superiority of our algorithms in terms of both the computation time and the sample complexity.

摘要

我们研究了随机多臂识别问题,其中代理者从给定的 个臂中顺序探索一个大小为 的子集(也称为 ),并试图识别最佳超级臂。到目前为止,大多数工作都考虑了半臂问题设置,其中代理者可以观察每个拉出的臂的奖励,或者假设每个臂在每一轮都可以被查询。然而,在实际应用中,观察单个臂的奖励是昂贵的,有时甚至是不可能的。在这项研究中,我们解决了全臂问题设置,其中在每次抽取时只给出超级臂的总和的一个噪声观测值。虽然我们的问题可以被视为线性带臂中的最佳臂识别的一个实例,但基于线性带臂的简单方法在计算上是不可行的,因为超级臂的数量是指数级的。为了解决这个问题,我们首先设计了一种用于置信椭球最大化中出现的 0-1 二次规划问题的多项式时间逼近算法。基于我们的逼近算法,我们提出了一种带臂算法,其计算时间为 (log ),从而比线性带臂算法实现了指数级的加速。我们提供了一个样本复杂度上界,仍然是最坏情况下最优的。最后,我们在具有超过 10 个超级臂的大规模数据集上进行了实验,证明了我们的算法在计算时间和样本复杂度方面的优越性。

相似文献

1
Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback.多项式时间算法,用于具有全带反馈的多臂识别。
Neural Comput. 2020 Sep;32(9):1733-1773. doi: 10.1162/neco_a_01299. Epub 2020 Jul 20.
2
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
3
An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.
4
Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures.使用层次结构的渐近最优上下文博弈算法
IEEE Trans Neural Netw Learn Syst. 2019 Mar;30(3):923-937. doi: 10.1109/TNNLS.2018.2854796. Epub 2018 Aug 2.
5
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
6
Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.具有切换成本的自我 unaware 对抗性多臂老虎机
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2908-2922. doi: 10.1109/TNNLS.2021.3110194. Epub 2023 Jun 1.
7
Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits.协方差矩阵适应的多目标多臂赌博机。
IEEE Trans Neural Netw Learn Syst. 2019 Aug;30(8):2493-2502. doi: 10.1109/TNNLS.2018.2885123. Epub 2018 Dec 28.
8
A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials.一种基于情境博弈的临床试验明智决策方法。
Life (Basel). 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277.
9
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
10
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data.在存在缺失数据的情况下使用多臂赌博机算法时的一些性能考虑因素。
PLoS One. 2022 Sep 12;17(9):e0274272. doi: 10.1371/journal.pone.0274272. eCollection 2022.