Xu Xiongxiao, Xie Hong, Lui John C S
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4763-4775. doi: 10.1109/TNNLS.2021.3124603. Epub 2023 Aug 4.
Contextual bandit is a popular sequential decision-making framework to balance the exploration and exploitation tradeoff in many applications such as recommender systems, search engines, etc. Motivated by two important factors in real-world applications: 1) latent contexts (or features) often exist and 2) feedbacks often have humans in the loop leading to human biases, we formulate a generalized contextual bandit framework with latent contexts. Our proposed framework includes a two-layer probabilistic interpretable model for the feedbacks from human with latent features. We design a GCL-PS algorithm for the proposed framework, which utilizes posterior sampling to balance the exploration and exploitation tradeoff. We prove a sublinear regret upper bound for GCL-PS, and prove a lower bound for the proposed bandit framework revealing insights on the optimality of GCL-PS. To further improve the computational efficiency of GCL-PS, we propose a Markov Chain Monte Carlo (MCMC) algorithm to generate approximate samples, resulting in our GCL-PSMC algorithm. We not only prove a sublinear Bayesian regret upper bound for our GCL-PSMC algorithm, but also reveal insights into the tradeoff between computational efficiency and sequential decision accuracy. Finally, we apply the proposed framework to hotel recommendations and news article recommendations, and show its superior performance over a variety of baselines via experiments on two public datasets.
上下文博弈是一种流行的序列决策框架,用于在许多应用(如推荐系统、搜索引擎等)中平衡探索与利用之间的权衡。受现实世界应用中的两个重要因素的启发:1)潜在上下文(或特征)经常存在,2)反馈通常涉及人类,从而导致人为偏差,我们提出了一个带有潜在上下文的广义上下文博弈框架。我们提出的框架包括一个用于具有潜在特征的人类反馈的两层概率可解释模型。我们为提出的框架设计了一种GCL-PS算法,该算法利用后验采样来平衡探索与利用之间的权衡。我们证明了GCL-PS的次线性遗憾上界,并证明了所提出的博弈框架的下界,揭示了GCL-PS最优性的见解。为了进一步提高GCL-PS的计算效率,我们提出了一种马尔可夫链蒙特卡罗(MCMC)算法来生成近似样本,从而得到我们的GCL-PSMC算法。我们不仅证明了我们的GCL-PSMC算法的次线性贝叶斯遗憾上界,还揭示了计算效率与序列决策准确性之间权衡的见解。最后,我们将提出的框架应用于酒店推荐和新闻文章推荐,并通过在两个公共数据集上的实验表明其优于各种基线的性能。