Suppr超能文献

具有潜在特征的广义上下文博弈:算法与应用

Generalized Contextual Bandits With Latent Features: Algorithms and Applications.

作者信息

Xu Xiongxiao, Xie Hong, Lui John C S

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.

Abstract

Contextual bandit is a popular sequential decision-making framework to balance the exploration and exploitation tradeoff in many applications such as recommender systems, search engines, etc. Motivated by two important factors in real-world applications: 1) latent contexts (or features) often exist and 2) feedbacks often have humans in the loop leading to human biases, we formulate a generalized contextual bandit framework with latent contexts. Our proposed framework includes a two-layer probabilistic interpretable model for the feedbacks from human with latent features. We design a GCL-PS algorithm for the proposed framework, which utilizes posterior sampling to balance the exploration and exploitation tradeoff. We prove a sublinear regret upper bound for GCL-PS, and prove a lower bound for the proposed bandit framework revealing insights on the optimality of GCL-PS. To further improve the computational efficiency of GCL-PS, we propose a Markov Chain Monte Carlo (MCMC) algorithm to generate approximate samples, resulting in our GCL-PSMC algorithm. We not only prove a sublinear Bayesian regret upper bound for our GCL-PSMC algorithm, but also reveal insights into the tradeoff between computational efficiency and sequential decision accuracy. Finally, we apply the proposed framework to hotel recommendations and news article recommendations, and show its superior performance over a variety of baselines via experiments on two public datasets.

摘要

上下文博弈是一种流行的序列决策框架,用于在许多应用(如推荐系统、搜索引擎等)中平衡探索与利用之间的权衡。受现实世界应用中的两个重要因素的启发:1)潜在上下文(或特征)经常存在,2)反馈通常涉及人类,从而导致人为偏差,我们提出了一个带有潜在上下文的广义上下文博弈框架。我们提出的框架包括一个用于具有潜在特征的人类反馈的两层概率可解释模型。我们为提出的框架设计了一种GCL-PS算法,该算法利用后验采样来平衡探索与利用之间的权衡。我们证明了GCL-PS的次线性遗憾上界,并证明了所提出的博弈框架的下界,揭示了GCL-PS最优性的见解。为了进一步提高GCL-PS的计算效率,我们提出了一种马尔可夫链蒙特卡罗(MCMC)算法来生成近似样本,从而得到我们的GCL-PSMC算法。我们不仅证明了我们的GCL-PSMC算法的次线性贝叶斯遗憾上界,还揭示了计算效率与序列决策准确性之间权衡的见解。最后,我们将提出的框架应用于酒店推荐和新闻文章推荐,并通过在两个公共数据集上的实验表明其优于各种基线的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验