Suppr超能文献

非平稳多臂老虎机问题的汤普森采样

Thompson Sampling for Non-Stationary Bandit Problems.

作者信息

Qi Han, Guo Fei, Zhu Li

机构信息

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

出版信息

Entropy (Basel). 2025 Jan 9;27(1):51. doi: 10.3390/e27010051.

Abstract

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret upper bounds of both algorithms are O~(TBT), where is the time horizon and BT is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods.

摘要

非平稳多臂老虎机(MAB)问题近来受到了广泛关注。我们聚焦于奖励分布在特定时间段内保持不变且在未知时间步发生变化的突变场景。尽管汤普森采样(TS)在非平稳设置中已展现出成效,但目前尚无针对具有无信息先验的TS的遗憾界分析。为解决这一问题,我们提出了两种算法,即折扣TS和滑动窗口TS,它们专为次高斯奖励分布而设计。对于这些算法,我们通过限制次优臂被拉动的期望次数来建立期望遗憾的上界。我们证明,这两种算法的遗憾上界均为O~(TBT),其中T是时间范围,BT是断点数量。该上界与突变问题的下界在对数因子范围内相匹配。与其他非平稳老虎机算法的实证比较突出了我们所提出方法的竞争性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9018/11765042/351483cae485/entropy-27-00051-g001.jpg

相似文献

3
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
4
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
6
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
7
Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.具有切换成本的自我 unaware 对抗性多臂老虎机
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2908-2922. doi: 10.1109/TNNLS.2021.3110194. Epub 2023 Jun 1.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验