• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。

A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.

作者信息

Yang Long, Li Zhao, Hu Zehong, Ruan Shasha, Pan Gang

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.

DOI:10.1109/TNNLS.2023.3295360
PMID:37527328
Abstract

In this article, we propose a Thompson sampling algorithm with Gaussian prior for unimodal bandit under Gaussian reward setting, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, the proposed algorithm makes decisions according to posterior distribution only in the arm's neighborhood with the highest empirical mean estimate. We theoretically prove that the asymptotic regret of our algorithm reaches O(logT) , i.e., it shares the same regret order with asymptotic optimal algorithms, which is comparable to extensive existing state-of-the-art unimodal multiarm bandit (U-MAB) algorithms. Finally, we use extensive experiments to demonstrate the effectiveness of the proposed algorithm on both synthetic datasets and real-world applications.

摘要

在本文中,我们针对高斯奖励设置下的单峰博弈提出了一种具有高斯先验的汤普森采样算法,其中期望奖励在部分有序臂上是单峰的。为了更好地利用单峰结构,在每一步,所提出的算法不是从整个决策空间进行探索,而是仅根据后验分布在具有最高经验均值估计的臂的邻域内做出决策。我们从理论上证明了我们算法的渐近遗憾达到(O(\log T)),即它与渐近最优算法具有相同的遗憾阶数,这与现有的大量先进单峰多臂博弈(U - MAB)算法相当。最后,我们通过大量实验证明了所提出算法在合成数据集和实际应用中的有效性。

相似文献

1
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
2
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
3
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
4
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
5
An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.
6
Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?基于Zynq片上系统的多臂赌博机算法:采用频率主义方法还是贝叶斯方法?
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2602-2615. doi: 10.1109/TNNLS.2022.3190509. Epub 2024 Feb 5.
7
Multi-Armed Bandit-Based User Network Node Selection.基于多臂赌博机的用户网络节点选择
Sensors (Basel). 2024 Jun 24;24(13):4104. doi: 10.3390/s24134104.
8
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
9
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control.用于在采样控制下实时监测高维数据的强盗变点检测
Technometrics. 2023;65(1):33-43. doi: 10.1080/00401706.2022.2054861. Epub 2022 Apr 22.
10
Generalized Contextual Bandits With Latent Features: Algorithms and Applications.具有潜在特征的广义上下文博弈:算法与应用
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.

引用本文的文献

1
Counterclockwise block-by-block knowledge distillation for neural network compression.用于神经网络压缩的逆时针逐块知识蒸馏
Sci Rep. 2025 Apr 3;15(1):11369. doi: 10.1038/s41598-025-91152-3.