使用机器学习工具进行高效估计的基于通用筛法的策略。

Universal sieve-based strategies for efficient estimation using machine learning tools.

作者信息

Qiu Hongxiang, Luedtke Alex, Carone Marco

机构信息

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Department of Statistics, University of Washington, Seattle, WA, USA.

出版信息

Bernoulli (Andover). 2021 Nov;27(4):2300-2336. doi: 10.3150/20-BEJ1309. Epub 2021 Aug 24.

DOI:10.3150/20-BEJ1309

PMID:34733110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8561841/

Abstract

Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis for inference. Though there are several existing methods to construct asymptotically efficient plug-in estimators, each such method either can only be derived using knowledge of efficiency theory or is only valid under stringent smoothness assumptions. Among existing methods, sieve estimators stand out as particularly convenient because efficiency theory is not required in their construction, their tuning parameters can be selected data adaptively, and they are universal in the sense that the same fits lead to efficient plug-in estimators for a rich class of estimands. Inspired by these desirable properties, we propose two novel universal approaches for estimating function-valued features that can be analyzed using sieve estimation theory. Compared to traditional sieve estimators, these approaches are valid under more general conditions on the smoothness of the function-valued features by utilizing flexible estimates that can be obtained, for example, using machine learning.

摘要

假设我们希望在非参数模型下估计潜在数据生成机制的一个或多个函数值特征的有限维汇总。一种估计方法是代入这些特征的灵活估计值。不幸的是，一般来说，这样的估计量可能不是渐近有效的，这通常使得这些估计量难以用作推断的基础。尽管有几种现有的方法来构造渐近有效的代入估计量，但每种这样的方法要么只能利用效率理论的知识推导出来，要么仅在严格的光滑性假设下才有效。在现有方法中，筛估计量特别方便，因为在其构造过程中不需要效率理论，其调优参数可以根据数据自适应选择，并且它们具有通用性，即相同的拟合会为一大类被估计量产生有效的代入估计量。受这些理想特性的启发，我们提出了两种新颖的通用方法来估计函数值特征，这些方法可以使用筛估计理论进行分析。与传统的筛估计量相比，通过利用例如使用机器学习可以获得的灵活估计值，这些方法在函数值特征的光滑性更一般的条件下是有效的。

相似文献

Universal sieve-based strategies for efficient estimation using machine learning tools.使用机器学习工具进行高效估计的基于通用筛法的策略。

Bernoulli (Andover). 2021 Nov;27(4):2300-2336. doi: 10.3150/20-BEJ1309. Epub 2021 Aug 24.

Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

A Flexible Framework for Nonparametric Graphical Modeling that Accommodates Machine Learning.一种适用于机器学习的非参数图形建模灵活框架。

Proc Mach Learn Res. 2020 Jul;119:10442-10451.

Estimating and Testing Vaccine Sieve Effects Using Machine Learning.使用机器学习估计和测试疫苗筛选效果

J Am Stat Assoc. 2019;114(527):1038-1049. doi: 10.1080/01621459.2018.1529594. Epub 2019 Apr 3.

A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA.半参数模型中捆绑参数的筛M定理及其在删失数据线性模型有效估计中的应用

Ann Stat. 2011;39(6):2795-3443.

Asymptotic Properties of Neural Network Sieve Estimators.神经网络筛估计量的渐近性质

J Nonparametr Stat. 2023;35(4):839-868. doi: 10.1080/10485252.2023.2209218. Epub 2023 May 13.

SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.用于多变量纵向和生存数据的半参数潜在类别模型

Ann Stat. 2022 Feb;50(1):487-510. doi: 10.1214/21-aos2117. Epub 2022 Feb 16.

Targeted estimation of nuisance parameters to obtain valid statistical inference.对干扰参数进行有针对性的估计以获得有效的统计推断。

Int J Biostat. 2014;10(1):29-57. doi: 10.1515/ijb-2012-0038.

Toward computerized efficient estimation in infinite-dimensional models.迈向无限维模型中的计算机化高效估计。

J Am Stat Assoc. 2019;114(527):1174-1190. doi: 10.1080/01621459.2018.1482752. Epub 2018 Sep 13.

A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.一种基于高度自适应套索的一般有效基于靶向最小损失的估计器。

Int J Biostat. 2017 Oct 12;13(2):/j/ijb.2017.13.issue-2/ijb-2015-0097/ijb-2015-0097.xml. doi: 10.1515/ijb-2015-0097.

引用本文的文献

Prediction sets adaptive to unknown covariate shift.适应未知协变量转移的预测集

J R Stat Soc Series B Stat Methodol. 2023 Jul 17;85(5):1680-1705. doi: 10.1093/jrsssb/qkad069. eCollection 2023 Nov.

本文引用的文献

Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso.高效估计具有欠平滑高度自适应套索的路径可微目标参数。

Int J Biostat. 2022 Jul 15;19(1):261-289. doi: 10.1515/ijb-2019-0092. eCollection 2023 May 1.

Nonparametric variable importance assessment using machine learning techniques.基于机器学习技术的非参数变量重要性评估。

Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.

The Highly Adaptive Lasso Estimator.高度自适应套索估计器

Proc Int Conf Data Sci Adv Anal. 2016;2016:689-696. doi: 10.1109/DSAA.2016.93. Epub 2016 Dec 26.

A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.一种基于高度自适应套索的一般有效基于靶向最小损失的估计器。

Int J Biostat. 2017 Oct 12;13(2):/j/ijb.2017.13.issue-2/ijb-2015-0097/ijb-2015-0097.xml. doi: 10.1515/ijb-2015-0097.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验