Cai Weixin, van der Laan Mark
Division of Biostatistics, University of California, Berkeley, USA.
Int J Biostat. 2020 Aug 10. doi: 10.1515/ijb-2017-0070.
The Highly-Adaptive least absolute shrinkage and selection operator (LASSO) Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functions (i.e., relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functions by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect when observing a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals.
高度自适应的最小绝对收缩与选择算子(LASSO)靶向最小损失估计器(HAL-TMLE)是统计模型中路径可微参数的一种有效插件估计器,该模型至少(且可能仅)假设真实干扰函数(即数据分布的相关部分)的截面变差范数是有限的。它依赖于干扰函数的初始估计器(HAL-MLE),通过在候选函数的截面变差范数由一个常数界定的约束下,在参数空间上最小化经验风险来得到,其中这个常数可以通过交叉验证来选择。在本文中,我们证明了对于HAL-TMLE的非参数自助法,将截面变差范数的值固定为大于或等于交叉验证选择的值,为估计HAL-TMLE的正态极限分布提供了一种一致的方法。为了优化非参数自助置信区间的有限样本覆盖率,我们提出了一种针对此截面变差范数的选择方法,该方法基于对所有大于交叉验证所选值的截面变差范数进行非参数自助,然后确定一个使得所得置信区间宽度达到平稳的值。我们展示了我们的方法用于1)在观察协变量向量、二元处理和结果时非参数估计平均处理效应,以及2)非参数估计数据分布的多元密度平方的积分。此外,我们还给出了这两个例子的模拟结果,证明了基于自助法的置信区间具有出色的有限样本覆盖率。