关于最佳臂识别的基于间隙的下界技术

On Gap-Based Lower Bounding Techniques for Best-Arm Identification.

作者信息

Truong Lan V, Scarlett Jonathan

机构信息

Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.

Department of Computer Science & Department of Mathematics, National University of Singapore, Singapore 117418, Singapore.

出版信息

Entropy (Basel). 2020 Jul 20;22(7):788. doi: 10.3390/e22070788.

DOI:10.3390/e22070788

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7517353/

Abstract

In this paper, we consider techniques for establishing lower bounds on the number of arm pulls for best-arm identification in the multi-armed bandit problem. While a recent divergence-based approach was shown to provide improvements over an older gap-based approach, we show that the latter can be refined to match the former (up to constant factors) in many cases of interest under Bernoulli rewards, including the case that the rewards are bounded away from zero and one. Together with existing upper bounds, this indicates that the divergence-based and gap-based approaches are both effective for establishing sample complexity lower bounds for best-arm identification.

摘要

在本文中，我们考虑在多臂老虎机问题中确定最佳臂识别所需拉臂次数下限的技术。虽然最近基于散度的方法相较于旧的基于差距的方法有改进，但我们表明，在伯努利奖励下的许多感兴趣的情况下，包括奖励远离零和一的情况，后者可以得到改进以与前者匹配（相差常数因子）。结合现有的上限，这表明基于散度的方法和基于差距的方法对于确定最佳臂识别的样本复杂度下限都是有效的。

相似文献

1

On Gap-Based Lower Bounding Techniques for Best-Arm Identification.关于最佳臂识别的基于间隙的下界技术

Entropy (Basel). 2020 Jul 20;22(7):788. doi: 10.3390/e22070788.

2

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits.探索随机离散多臂老虎机时信息价值的分析

Entropy (Basel). 2018 Feb 28;20(3):155. doi: 10.3390/e20030155.

3

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法：为何乐观值函数能在多臂老虎机问题中找到最优解？

Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.

4

Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback.多项式时间算法，用于具有全带反馈的多臂识别。

Neural Comput. 2020 Sep;32(9):1733-1773. doi: 10.1162/neco_a_01299. Epub 2020 Jul 20.

5

A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials.一种基于情境博弈的临床试验明智决策方法。

Life (Basel). 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277.

6

Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。

Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.

7

Adaptive designs for best treatment identification with top-two Thompson sampling and acceleration.最优治疗识别的自适应设计：基于前两名汤普森抽样和加速

Pharm Stat. 2023 Nov-Dec;22(6):1089-1103. doi: 10.1002/pst.2331. Epub 2023 Aug 12.

8

Information-Theoretic Generalization Bounds for Meta-Learning and Applications.元学习及其应用的信息论泛化界

Entropy (Basel). 2021 Jan 19;23(1):126. doi: 10.3390/e23010126.

9

PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison.

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15308-15327. doi: 10.1109/TPAMI.2023.3305381. Epub 2023 Nov 3.

10

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。

IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.

本文引用的文献

1

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.用于临床试验优化设计的多臂老虎机模型：益处与挑战

Stat Sci. 2015;30(2):199-215. doi: 10.1214/14-STS504.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验