van der Laan Mark J
Int J Biostat. 2014;10(1):29-57. doi: 10.1515/ijb-2012-0038.
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.
为了获得具体结果,我们专注于在控制所有已测量的基线协变量的情况下,对特定治疗均值进行估计,这是基于观察由基线协变量、随后分配的二元治疗以及最终结果组成的随机变量的独立同分布副本。统计模型仅假设在给定协变量(即所谓的倾向得分)的情况下,对治疗的条件分布可能存在限制。特定治疗均值的估计量涉及倾向得分的估计和/或给定治疗及协变量时结果的条件均值的估计。为了使这些估计量在统计模型的任何数据分布下渐近无偏,使用诸如集成学习(特别是超学习)等数据自适应估计量来估计这些干扰参数至关重要。由于此类估计量涉及相对于无限维干扰参数本身的偏差和方差的最优权衡,它们会导致对被估计量的实值估计量产生次优的偏差/方差权衡。我们证明,对这些干扰参数的估计量进行额外的针对性处理可确保被估计量的这种偏差是二阶的,从而使我们能够证明在正则条件下特定治疗均值估计量的渐近线性定理。这些见解产生了新颖的基于目标最小损失的估计量(TMLEs),它们使用集成学习并进行额外的目标偏差减少来构建干扰参数的估计量。特别是,我们构建了具有已知影响曲线的协作TMLEs(C-TMLEs),允许进行统计推断,尽管这些C-TMLEs基于衡量倾向得分的最终拟合在消除被估计量偏差方面的有效性的标准,对倾向得分进行变量选择。作为一个特殊的特殊情况,我们还展示了使用超学习来拟合倾向得分的治疗加权逆概率估计量对倾向得分的所需针对性处理。