文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Yang Long, Li Zhao, Hu Zehong, Ruan Shasha, Pan Gang

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.

In this article, we propose a Thompson sampling algorithm with Gaussian prior for unimodal bandit under Gaussian reward setting, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, the proposed algorithm makes decisions according to posterior distribution only in the arm's neighborhood with the highest empirical mean estimate. We theoretically prove that the asymptotic regret of our algorithm reaches O(logT) , i.e., it shares the same regret order with asymptotic optimal algorithms, which is comparable to extensive existing state-of-the-art unimodal multiarm bandit (U-MAB) algorithms. Finally, we use extensive experiments to demonstrate the effectiveness of the proposed algorithm on both synthetic datasets and real-world applications.

在本文中，我们针对高斯奖励设置下的单峰博弈提出了一种具有高斯先验的汤普森采样算法，其中期望奖励在部分有序臂上是单峰的。为了更好地利用单峰结构，在每一步，所提出的算法不是从整个决策空间进行探索，而是仅根据后验分布在具有最高经验均值估计的臂的邻域内做出决策。我们从理论上证明了我们算法的渐近遗憾达到(O(\log T))，即它与渐近最优算法具有相同的遗憾阶数，这与现有的大量先进单峰多臂博弈（U - MAB）算法相当。最后，我们通过大量实验证明了所提出算法在合成数据集和实际应用中的有效性。

Yang Long, Li Zhao, Hu Zehong, Ruan Shasha, Pan Gang

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。

A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.

作者信息

出版信息

相似文献

引用本文的文献

一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。

A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.

作者信息

出版信息

相似文献

引用本文的文献