Suppr超能文献

策略学习的性能保证

Performance Guarantees for Policy Learning.

作者信息

Luedtke Alex, Chambaz Antoine

机构信息

Department of Statistics, University of Washington, USA.

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, USA.

出版信息

Ann I H P Probab Stat. 2020 Aug;56(3):2162-2188. doi: 10.1214/19-aihp1034. Epub 2020 Jun 26.

Abstract

This article gives performance guarantees for the regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes when the data are generated from a fixed data distribution that does not change with sample size, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also present a result giving guarantees on the regret decay of policy estimators for the case that the policy falls within a restricted class and the data are generated from local perturbations of a fixed distribution, where this guarantee is uniform in the direction of the local perturbation. Finally, we give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Three examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value, and the number of possible actions is either two or finitely many.

摘要

本文给出了最优策略估计中遗憾值衰减的性能保证。我们给出了一个无余量结果,表明当数据由不随样本量变化的固定数据分布生成时,对于经验风险最小化者而言,在唐斯克类上估计类内最优策略时的遗憾值衰减是二阶的,其遗憾值衰减速度比最优策略值的有效估计量的标准误差更快。我们还给出了一个结果,对于策略属于受限类且数据由固定分布的局部扰动生成的情况,给出了策略估计器遗憾值衰减的保证,该保证在局部扰动方向上是一致的。最后,我们给出了分类文献中的一个结果,表明如果余量条件成立,通过插件估计可以实现更快的遗憾值衰减。我们考虑了三个例子。在这些例子中,遗憾值用均值或中位数表示,可能的行动数量为两个或有限多个。

相似文献

1
Performance Guarantees for Policy Learning.策略学习的性能保证
Ann I H P Probab Stat. 2020 Aug;56(3):2162-2188. doi: 10.1214/19-aihp1034. Epub 2020 Jun 26.
5
Robust regression for optimal individualized treatment rules.稳健回归在最优个体化治疗规则中的应用。
Stat Med. 2019 May 20;38(11):2059-2073. doi: 10.1002/sim.8102. Epub 2019 Feb 11.
9
Ensemble estimators for multivariate entropy estimation.用于多元熵估计的集成估计器。
IEEE Trans Inf Theory. 2013 Jul;59(7):4374-4388. doi: 10.1109/TIT.2013.2251456.

本文引用的文献

1
Comment.评论。
J Am Stat Assoc. 2016;111(516):1526-1530. doi: 10.1080/01621459.2016.1242427. Epub 2017 Jan 4.
4
Interactive Q-learning for Quantiles.用于分位数的交互式Q学习
J Am Stat Assoc. 2017;112(518):638-649. doi: 10.1080/01621459.2016.1155993. Epub 2017 Mar 31.
5
Super-Learning of an Optimal Dynamic Treatment Rule.最优动态治疗规则的超学习
Int J Biostat. 2016 May 1;12(1):305-32. doi: 10.1515/ijb-2015-0052.
10
A doubly robust censoring unbiased transformation.一种双重稳健的删失无偏变换。
Int J Biostat. 2007;3(1):Article 4. doi: 10.2202/1557-4679.1052.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验