Suppr超能文献

基于自适应收集数据的M估计量的统计推断。

Statistical Inference with M-Estimators on Adaptively Collected Data.

作者信息

Zhang Kelly W, Janson Lucas, Murphy Susan A

机构信息

Department of Computer Science, Harvard University.

Departments of Statistics, Harvard University.

出版信息

Adv Neural Inf Process Syst. 2021 Dec;34:7460-7471.

Abstract

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators-which includes estimators based on empirical risk minimization as well as maximum likelihood-on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.

摘要

强盗算法越来越多地应用于现实世界中的序贯决策问题。与此相关的是,人们越来越希望能够利用由此产生的数据集来回答科学问题,比如:一种广告类型是否能带来更多购买量?移动健康干预在哪些情况下有效?然而,经典统计方法在与通过强盗算法收集的数据一起使用时,无法提供有效的置信区间。最近已经为简单模型开发了替代方法(例如,均值比较)。然而,缺乏用于对使用(上下文)强盗算法收集的数据使用更复杂模型进行统计推断的通用方法;例如,当前方法不能用于对二元奖励的逻辑回归模型中的参数进行有效推断。在这项工作中,我们发展了理论,证明了M估计量(包括基于经验风险最小化的估计量以及最大似然估计量)在使用自适应算法(包括(上下文)强盗算法)收集的数据上的应用是合理的。具体来说,我们表明,用特定自适应权重修改后的M估计量可用于为各种推断目标构建渐近有效的置信区域。

相似文献

2
Post-Contextual-Bandit Inference.后情境策略推理
Adv Neural Inf Process Syst. 2021 Dec;34:28548-28559.
3
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
4
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
5
Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.在线决策的统计推断:上下文博弈设置
J Am Stat Assoc. 2021;116(533):240-255. doi: 10.1080/01621459.2020.1770098. Epub 2020 Jul 7.
7
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison.
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15308-15327. doi: 10.1109/TPAMI.2023.3305381. Epub 2023 Nov 3.
8
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.

本文引用的文献

1
Post-Contextual-Bandit Inference.后情境策略推理
Adv Neural Inf Process Syst. 2021 Dec;34:28548-28559.
2
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
3
Power Constrained Bandits.功率受限的强盗算法
Proc Mach Learn Res. 2021 Aug;149:209-259.
5
6
Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.在线决策的统计推断:上下文博弈设置
J Am Stat Assoc. 2021;116(533):240-255. doi: 10.1080/01621459.2020.1770098. Epub 2020 Jul 7.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验