• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于对抗性多臂老虎机问题的在线极小极大最优算法。

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.

作者信息

Gokcesu Kaan, Kozat Suleyman Serdar

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.

DOI:10.1109/TNNLS.2018.2806006
PMID:29994080
Abstract

We investigate the adversarial multiarmed bandit problem and introduce an online algorithm that asymptotically achieves the performance of the best switching bandit arm selection strategy. Our algorithms are truly online such that we do not use the game length or the number of switches of the best arm selection strategy in their constructions. Our results are guaranteed to hold in an individual sequence manner, since we have no statistical assumptions on the bandit arm losses. Our regret bounds, i.e., our performance bounds with respect to the best bandit arm selection strategy, are minimax optimal up to logarithmic terms. We achieve the minimax optimal regret with computational complexity only log-linear in the game length. Thus, our algorithms can be efficiently used in applications involving big data. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art switching bandit algorithms. We also introduce a general efficiently implementable bandit arm selection framework, which can be adapted to various applications.

摘要

我们研究对抗性多臂老虎机问题,并引入一种在线算法,该算法渐近地实现了最佳切换老虎机臂选择策略的性能。我们的算法是真正在线的,以至于在其构造中不使用最佳臂选择策略的博弈长度或切换次数。由于我们对老虎机臂损失没有统计假设,我们的结果保证以个体序列的方式成立。我们的遗憾界,即相对于最佳老虎机臂选择策略的性能界,在对数项范围内是极小极大最优的。我们以仅与博弈长度成对数线性的计算复杂度实现了极小极大最优遗憾。因此,我们的算法可以有效地应用于涉及大数据的应用中。通过一系列涉及合成数据和真实数据的广泛实验,我们证明了所提出的算法相对于现有最先进的切换老虎机算法实现了显著的性能提升。我们还引入了一个通用的可有效实现的老虎机臂选择框架,该框架可适用于各种应用。

相似文献

1
An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.
2
Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures.使用层次结构的渐近最优上下文博弈算法
IEEE Trans Neural Netw Learn Syst. 2019 Mar;30(3):923-937. doi: 10.1109/TNNLS.2018.2854796. Epub 2018 Aug 2.
3
Online Density Estimation of Nonstationary Sources Using Exponential Family of Distributions.使用指数分布族的非平稳源在线密度估计
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4473-4478. doi: 10.1109/TNNLS.2017.2740003. Epub 2017 Sep 13.
4
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
5
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
6
Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback.多项式时间算法,用于具有全带反馈的多臂识别。
Neural Comput. 2020 Sep;32(9):1733-1773. doi: 10.1162/neco_a_01299. Epub 2020 Jul 20.
7
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
8
Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?基于Zynq片上系统的多臂赌博机算法:采用频率主义方法还是贝叶斯方法?
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2602-2615. doi: 10.1109/TNNLS.2022.3190509. Epub 2024 Feb 5.
9
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison.
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15308-15327. doi: 10.1109/TPAMI.2023.3305381. Epub 2023 Nov 3.
10
Greedy Methods, Randomization Approaches, and Multiarm Bandit Algorithms for Efficient Sparsity-Constrained Optimization.
IEEE Trans Neural Netw Learn Syst. 2017 Nov;28(11):2789-2802. doi: 10.1109/TNNLS.2016.2600243. Epub 2016 Sep 9.