• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有潜在特征的广义上下文博弈:算法与应用

Generalized Contextual Bandits With Latent Features: Algorithms and Applications.

作者信息

Xu Xiongxiao, Xie Hong, Lui John C S

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.

DOI:10.1109/TNNLS.2021.3124603
PMID:34780337
Abstract

Contextual bandit is a popular sequential decision-making framework to balance the exploration and exploitation tradeoff in many applications such as recommender systems, search engines, etc. Motivated by two important factors in real-world applications: 1) latent contexts (or features) often exist and 2) feedbacks often have humans in the loop leading to human biases, we formulate a generalized contextual bandit framework with latent contexts. Our proposed framework includes a two-layer probabilistic interpretable model for the feedbacks from human with latent features. We design a GCL-PS algorithm for the proposed framework, which utilizes posterior sampling to balance the exploration and exploitation tradeoff. We prove a sublinear regret upper bound for GCL-PS, and prove a lower bound for the proposed bandit framework revealing insights on the optimality of GCL-PS. To further improve the computational efficiency of GCL-PS, we propose a Markov Chain Monte Carlo (MCMC) algorithm to generate approximate samples, resulting in our GCL-PSMC algorithm. We not only prove a sublinear Bayesian regret upper bound for our GCL-PSMC algorithm, but also reveal insights into the tradeoff between computational efficiency and sequential decision accuracy. Finally, we apply the proposed framework to hotel recommendations and news article recommendations, and show its superior performance over a variety of baselines via experiments on two public datasets.

摘要

上下文博弈是一种流行的序列决策框架,用于在许多应用(如推荐系统、搜索引擎等)中平衡探索与利用之间的权衡。受现实世界应用中的两个重要因素的启发:1)潜在上下文(或特征)经常存在,2)反馈通常涉及人类,从而导致人为偏差,我们提出了一个带有潜在上下文的广义上下文博弈框架。我们提出的框架包括一个用于具有潜在特征的人类反馈的两层概率可解释模型。我们为提出的框架设计了一种GCL-PS算法,该算法利用后验采样来平衡探索与利用之间的权衡。我们证明了GCL-PS的次线性遗憾上界,并证明了所提出的博弈框架的下界,揭示了GCL-PS最优性的见解。为了进一步提高GCL-PS的计算效率,我们提出了一种马尔可夫链蒙特卡罗(MCMC)算法来生成近似样本,从而得到我们的GCL-PSMC算法。我们不仅证明了我们的GCL-PSMC算法的次线性贝叶斯遗憾上界,还揭示了计算效率与序列决策准确性之间权衡的见解。最后,我们将提出的框架应用于酒店推荐和新闻文章推荐,并通过在两个公共数据集上的实验表明其优于各种基线的性能。

相似文献

1
Generalized Contextual Bandits With Latent Features: Algorithms and Applications.具有潜在特征的广义上下文博弈:算法与应用
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.
2
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
4
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits.非平稳上下文带臂赌博机的递归神经线性后验抽样。
Neural Comput. 2022 Oct 7;34(11):2232-2272. doi: 10.1162/neco_a_01539.
5
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
6
Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis.带噪声上下文的随机博弈的汤普森采样:一种信息论后悔分析
Entropy (Basel). 2024 Jul 17;26(7):606. doi: 10.3390/e26070606.
7
Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints.具有非线性博弈反馈和多样性约束的Top-K多臂博弈的主从深度架构
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17608-17619. doi: 10.1109/TNNLS.2023.3306801. Epub 2024 Dec 2.
8
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
9
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
10
Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.具有切换成本的自我 unaware 对抗性多臂老虎机
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2908-2922. doi: 10.1109/TNNLS.2021.3110194. Epub 2023 Jun 1.

引用本文的文献

1
Online learning in bandits with predicted context.带预测上下文的在线学习中的博弈问题
Proc Mach Learn Res. 2024 May;238:2215-2223.