Suppr超能文献

批量策略博弈的推断

Inference for Batched Bandits.

作者信息

Zhang Kelly W, Janson Lucas, Murphy Susan A

机构信息

Department of Computer Science, Harvard University.

Departments of Statistics, Harvard University.

出版信息

Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.

Abstract

As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.

摘要

随着强盗算法在科学研究和工业应用中越来越多地被使用,基于由此自适应收集的数据,对可靠推理方法的需求也在相应增加。在这项工作中,我们开发了用于对使用强盗算法分批收集的数据进行推理的方法。我们首先证明,在独立采样数据上渐近正态的普通最小二乘估计器(OLS),在没有唯一最优臂的情况下,对于使用标准强盗算法收集的数据也是渐近正态的。这种渐近非正态性结果意味着,OLS估计器近似正态的天真假设可能导致第一类错误膨胀以及覆盖概率低于名义值的置信区间。其次,我们引入了分批OLS估计器(BOLS),我们证明它(1)在从多臂和上下文强盗收集的数据上渐近正态,并且(2)对基线奖励中的非平稳性具有鲁棒性。

相似文献

1
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
2
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
3
Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.在线决策的统计推断:上下文博弈设置
J Am Stat Assoc. 2021;116(533):240-255. doi: 10.1080/01621459.2020.1770098. Epub 2020 Jul 7.
5
Post-Contextual-Bandit Inference.后情境策略推理
Adv Neural Inf Process Syst. 2021 Dec;34:28548-28559.
6
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
9
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.

本文引用的文献

1
Power Constrained Bandits.功率受限的强盗算法
Proc Mach Learn Res. 2021 Aug;149:209-259.
3
4
Scaling up behavioral science interventions in online education.将行为科学干预措施在在线教育中规模化。
Proc Natl Acad Sci U S A. 2020 Jun 30;117(26):14900-14905. doi: 10.1073/pnas.1921417117. Epub 2020 Jun 15.
8
Parametric-rate inference for one-sided differentiable parameters.单侧可微参数的参数速率推断。
J Am Stat Assoc. 2018;113(522):780-788. doi: 10.1080/01621459.2017.1285777. Epub 2017 Feb 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验