• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于为情境博弈设计稳健算法的乘数自助法。

A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.

作者信息

Xie Hong, Tang Qiao, Zhu Qingsheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.

DOI:10.1109/TNNLS.2022.3161806
PMID:35385392
Abstract

Upper confidence bound (UCB)-based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrap technique. Our proposed estimator mitigates the problem of specifying a heavier tail property by adaptively converging to the ground truth contextual bandit UCB (i.e., eliminating the impact of the specified heavier tail property) with theoretical guarantees on the convergence. The design and convergence analysis of the proposed estimator is technically nontrivial. The proposed estimator is generic and it can be applied to improve a variety of UCB-based contextual bandit algorithms. To demonstrate the versatility of the proposed estimator, we apply it to improve the linear reward contextual bandit UCB (LinUCB) algorithm resulting in our bootstrapping LinUCB (BootLinUCB) algorithm. We prove that the BootLinUCB has a sublinear regret. We conduct extensive experiments on both synthetic dataset and real-world dataset from Yahoo! to validate the benefits of our proposed estimator in reducing regret and the superior performance of BootLinUCB over the latest baseline.

摘要

基于上置信界(UCB)的上下文博弈算法要求人们了解奖励分布的尾部性质。不幸的是,在实际应用中,这种尾部性质通常是未知的或难以确定的。使用比真实情况更重的尾部性质会导致上下文博弈算法的学习速度缓慢,而使用更轻的尾部性质可能会导致算法发散。为了解决这个基本问题,我们基于乘数自助法技术开发了一种用于上下文博弈UCB的估计器(根据历史奖励进行评估)。我们提出的估计器通过自适应地收敛到真实的上下文博弈UCB(即消除指定的更重尾部性质的影响)来减轻指定更重尾部性质的问题,并在收敛方面有理论保证。所提出的估计器的设计和收敛分析在技术上并非易事。所提出的估计器是通用的,可应用于改进各种基于UCB的上下文博弈算法。为了证明所提出估计器的通用性,我们将其应用于改进线性奖励上下文博弈UCB(LinUCB)算法,从而得到我们的自助LinUCB(BootLinUCB)算法。我们证明BootLinUCB具有次线性遗憾。我们在合成数据集和雅虎的真实世界数据集上进行了广泛的实验,以验证我们提出的估计器在减少遗憾方面的好处以及BootLinUCB相对于最新基线的优越性能。

相似文献

1
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
2
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
3
Generalized Contextual Bandits With Latent Features: Algorithms and Applications.具有潜在特征的广义上下文博弈:算法与应用
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.
4
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits.非平稳上下文带臂赌博机的递归神经线性后验抽样。
Neural Comput. 2022 Oct 7;34(11):2232-2272. doi: 10.1162/neco_a_01539.
5
Minimax Optimal Bandits for Heavy Tail Rewards.重尾奖励的极小极大最优策略
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5280-5294. doi: 10.1109/TNNLS.2022.3203035. Epub 2024 Apr 4.
6
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
7
A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials.一种基于情境博弈的临床试验明智决策方法。
Life (Basel). 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277.
8
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
9
Wi-Fi Assisted Contextual Multi-Armed Bandit for Neighbor Discovery and Selection in Millimeter Wave Device to Device Communications.用于毫米波设备到设备通信中邻居发现与选择的Wi-Fi辅助上下文多臂赌博机算法
Sensors (Basel). 2021 Apr 17;21(8):2835. doi: 10.3390/s21082835.
10
Cascaded Algorithm Selection With Extreme-Region UCB Bandit.
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6782-6794. doi: 10.1109/TPAMI.2021.3094844. Epub 2022 Sep 14.

引用本文的文献

1
Stochastic modeling of obesity status in United States adults using Markov Chains: A nationally representative analysis of population health data from 2017-2020.使用马尔可夫链对美国成年人肥胖状况进行随机建模:对2017 - 2020年人口健康数据的全国代表性分析。
Obes Sci Pract. 2023 Jul 27;9(6):653-660. doi: 10.1002/osp4.697. eCollection 2023 Dec.
2
Covariate dependent Markov chains constructed with gradient boost modeling can effectively generate long-term predictions of obesity trends.基于梯度提升建模构建的协变量相关马尔可夫链,可以有效地生成肥胖趋势的长期预测。
BMC Res Notes. 2023 Nov 24;16(1):346. doi: 10.1186/s13104-023-06610-w.