• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自适应收集数据的M估计量的统计推断。

Statistical Inference with M-Estimators on Adaptively Collected Data.

作者信息

Zhang Kelly W, Janson Lucas, Murphy Susan A

机构信息

Department of Computer Science, Harvard University.

Departments of Statistics, Harvard University.

出版信息

Adv Neural Inf Process Syst. 2021 Dec;34:7460-7471.

PMID:35757490
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9232184/
Abstract

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators-which includes estimators based on empirical risk minimization as well as maximum likelihood-on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.

摘要

强盗算法越来越多地应用于现实世界中的序贯决策问题。与此相关的是,人们越来越希望能够利用由此产生的数据集来回答科学问题,比如:一种广告类型是否能带来更多购买量?移动健康干预在哪些情况下有效?然而,经典统计方法在与通过强盗算法收集的数据一起使用时,无法提供有效的置信区间。最近已经为简单模型开发了替代方法(例如,均值比较)。然而,缺乏用于对使用(上下文)强盗算法收集的数据使用更复杂模型进行统计推断的通用方法;例如,当前方法不能用于对二元奖励的逻辑回归模型中的参数进行有效推断。在这项工作中,我们发展了理论,证明了M估计量(包括基于经验风险最小化的估计量以及最大似然估计量)在使用自适应算法(包括(上下文)强盗算法)收集的数据上的应用是合理的。具体来说,我们表明,用特定自适应权重修改后的M估计量可用于为各种推断目标构建渐近有效的置信区域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/25c436e1d33f/nihms-1762664-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/3191d0dc3db3/nihms-1762664-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/6323eaaf5d15/nihms-1762664-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/53a6473f2305/nihms-1762664-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/6b4458f55707/nihms-1762664-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/b3c1289fc136/nihms-1762664-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/a51f274d9117/nihms-1762664-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/4147342140c8/nihms-1762664-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/25c436e1d33f/nihms-1762664-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/3191d0dc3db3/nihms-1762664-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/6323eaaf5d15/nihms-1762664-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/53a6473f2305/nihms-1762664-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/6b4458f55707/nihms-1762664-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/b3c1289fc136/nihms-1762664-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/a51f274d9117/nihms-1762664-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/4147342140c8/nihms-1762664-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4299/9232184/25c436e1d33f/nihms-1762664-f0002.jpg

相似文献

1
Statistical Inference with M-Estimators on Adaptively Collected Data.基于自适应收集数据的M估计量的统计推断。
Adv Neural Inf Process Syst. 2021 Dec;34:7460-7471.
2
Post-Contextual-Bandit Inference.后情境策略推理
Adv Neural Inf Process Syst. 2021 Dec;34:28548-28559.
3
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
4
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
5
Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.在线决策的统计推断:上下文博弈设置
J Am Stat Assoc. 2021;116(533):240-255. doi: 10.1080/01621459.2020.1770098. Epub 2020 Jul 7.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison.
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15308-15327. doi: 10.1109/TPAMI.2023.3305381. Epub 2023 Nov 3.
8
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
9
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
10
Targeted estimation of nuisance parameters to obtain valid statistical inference.对干扰参数进行有针对性的估计以获得有效的统计推断。
Int J Biostat. 2014;10(1):29-57. doi: 10.1515/ijb-2012-0038.

引用本文的文献

1
Adaptive randomization methods for sequential multiple assignment randomized trials (smarts) via thompson sampling.通过汤普森抽样实现序贯多分配随机试验(SMARTs)的自适应随机化方法。
Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae152.
2
Online learning in bandits with predicted context.带预测上下文的在线学习中的博弈问题
Proc Mach Learn Res. 2024 May;238:2215-2223.
3
Effect-Invariant Mechanisms for Policy Generalization.用于策略泛化的效果不变机制。
J Mach Learn Res. 2024;25.
4
Microrandomized Trials: Developing Just-in-Time Adaptive Interventions for Better Public Health.微随机试验:为改善公众健康开发即时自适应干预措施。
Am J Public Health. 2023 Jan;113(1):60-69. doi: 10.2105/AJPH.2022.307150. Epub 2022 Nov 22.

本文引用的文献

1
Post-Contextual-Bandit Inference.后情境策略推理
Adv Neural Inf Process Syst. 2021 Dec;34:28548-28559.
2
Inference for Batched Bandits.批量策略博弈的推断
Adv Neural Inf Process Syst. 2020 Dec;33:9818-9829.
3
Power Constrained Bandits.功率受限的强盗算法
Proc Mach Learn Res. 2021 Aug;149:209-259.
4
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity.个性化心脏运动计划:一种用于优化身体活动的强化学习算法
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2020 Mar;4(1). doi: 10.1145/3381007.
5
Confidence intervals for policy evaluation in adaptive experiments.自适应试验中政策评估的置信区间。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2014602118.
6
Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.在线决策的统计推断:上下文博弈设置
J Am Stat Assoc. 2021;116(533):240-255. doi: 10.1080/01621459.2020.1770098. Epub 2020 Jul 7.
7
Asymptotic theory for maximum likelihood estimates in reduced-rank multivariate generalized linear models.降秩多元广义线性模型中极大似然估计的渐近理论
Statistics (Ber). 2018 May 8;52(5):1005-1024. doi: 10.1080/02331888.2018.1467420. eCollection 2018.
8
Encouraging Physical Activity in Patients With Diabetes: Intervention Using a Reinforcement Learning System.鼓励糖尿病患者进行体育活动:使用强化学习系统的干预措施。
J Med Internet Res. 2017 Oct 10;19(10):e338. doi: 10.2196/jmir.7994.
9
Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.用于临床试验优化设计的多臂老虎机模型:益处与挑战
Stat Sci. 2015;30(2):199-215. doi: 10.1214/14-STS504.
10
Marginal structural models and causal inference in epidemiology.边缘结构模型与流行病学中的因果推断
Epidemiology. 2000 Sep;11(5):550-60. doi: 10.1097/00001648-200009000-00011.