• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非平稳多臂老虎机问题的汤普森采样

Thompson Sampling for Non-Stationary Bandit Problems.

作者信息

Qi Han, Guo Fei, Zhu Li

机构信息

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

出版信息

Entropy (Basel). 2025 Jan 9;27(1):51. doi: 10.3390/e27010051.

DOI:10.3390/e27010051
PMID:39851672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11765042/
Abstract

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret upper bounds of both algorithms are O~(TBT), where is the time horizon and BT is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods.

摘要

非平稳多臂老虎机(MAB)问题近来受到了广泛关注。我们聚焦于奖励分布在特定时间段内保持不变且在未知时间步发生变化的突变场景。尽管汤普森采样(TS)在非平稳设置中已展现出成效,但目前尚无针对具有无信息先验的TS的遗憾界分析。为解决这一问题,我们提出了两种算法,即折扣TS和滑动窗口TS,它们专为次高斯奖励分布而设计。对于这些算法,我们通过限制次优臂被拉动的期望次数来建立期望遗憾的上界。我们证明,这两种算法的遗憾上界均为O~(TBT),其中T是时间范围,BT是断点数量。该上界与突变问题的下界在对数因子范围内相匹配。与其他非平稳老虎机算法的实证比较突出了我们所提出方法的竞争性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/1797dd72ae91/entropy-27-00051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/351483cae485/entropy-27-00051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/a45a127b9390/entropy-27-00051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/a60e73c7ffc5/entropy-27-00051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/1797dd72ae91/entropy-27-00051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/351483cae485/entropy-27-00051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/a45a127b9390/entropy-27-00051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/a60e73c7ffc5/entropy-27-00051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/1797dd72ae91/entropy-27-00051-g004.jpg

相似文献

1
Thompson Sampling for Non-Stationary Bandit Problems.非平稳多臂老虎机问题的汤普森采样
Entropy (Basel). 2025 Jan 9;27(1):51. doi: 10.3390/e27010051.
2
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
3
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
4
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
5
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
6
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
7
Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.具有切换成本的自我 unaware 对抗性多臂老虎机
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2908-2922. doi: 10.1109/TNNLS.2021.3110194. Epub 2023 Jun 1.
8
Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis.带噪声上下文的随机博弈的汤普森采样:一种信息论后悔分析
Entropy (Basel). 2024 Jul 17;26(7):606. doi: 10.3390/e26070606.
9
The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.多臂老虎机中先验设定错误和选择性停止的风险。
Front Artif Intell. 2021 Jul 9;4:715690. doi: 10.3389/frai.2021.715690. eCollection 2021.
10
Multi-Armed Bandit-Based User Network Node Selection.基于多臂赌博机的用户网络节点选择
Sensors (Basel). 2024 Jun 24;24(13):4104. doi: 10.3390/s24134104.

本文引用的文献

1
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.