Suppr超能文献

一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。

A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.

作者信息

Yang Long, Li Zhao, Hu Zehong, Ruan Shasha, Pan Gang

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.

Abstract

In this article, we propose a Thompson sampling algorithm with Gaussian prior for unimodal bandit under Gaussian reward setting, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, the proposed algorithm makes decisions according to posterior distribution only in the arm's neighborhood with the highest empirical mean estimate. We theoretically prove that the asymptotic regret of our algorithm reaches O(logT) , i.e., it shares the same regret order with asymptotic optimal algorithms, which is comparable to extensive existing state-of-the-art unimodal multiarm bandit (U-MAB) algorithms. Finally, we use extensive experiments to demonstrate the effectiveness of the proposed algorithm on both synthetic datasets and real-world applications.

摘要

在本文中,我们针对高斯奖励设置下的单峰博弈提出了一种具有高斯先验的汤普森采样算法,其中期望奖励在部分有序臂上是单峰的。为了更好地利用单峰结构,在每一步,所提出的算法不是从整个决策空间进行探索,而是仅根据后验分布在具有最高经验均值估计的臂的邻域内做出决策。我们从理论上证明了我们算法的渐近遗憾达到(O(\log T)),即它与渐近最优算法具有相同的遗憾阶数,这与现有的大量先进单峰多臂博弈(U - MAB)算法相当。最后,我们通过大量实验证明了所提出算法在合成数据集和实际应用中的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验