Robins J M, Ritov Y
Harvard School of Public Health, Boston, MA 02115, USA.
Stat Med. 1997;16(1-3):285-319. doi: 10.1002/(sici)1097-0258(19970215)16:3<285::aid-sim535>3.0.co;2-#.
We argue, that due to the curse of dimensionality, there are major difficulties with any pure or smoothed likelihood-based method of inference in designed studies with randomly missing data when missingness depends on a high-dimensional vector of variables. We study in detail a semi-parametric superpopulation version of continuously stratified random sampling. We show that all estimators of the population mean that are uniformly consistent or that achieve an algebraic rate of convergence, no matter how slow, require the use of the selection (randomization) probabilities. We argue that, in contrast to likelihood methods which ignore these probabilities, inverse selection probability weighted estimators continue to perform well achieving uniform n 1/2-rates of convergence. We propose a curse of dimensionality appropriate (CODA) asymptotic theory for inference in non- and semi-parametric models in an attempt to formalize our arguments. We discuss whether our results constitute a fatal blow to the likelihood principle and study the attitude toward these that a committed subjective Bayesian would adopt. Finally, we apply our CODA theory to analyse the effect of the 'curse of dimensionality' in several interesting semi-parametric models, including a model for a two-armed randomized trial with randomization probabilities depending on a vector of continuous pretreatment covariates X. We provide substantive settings under which a subjective Bayesian would ignore the randomization probabilities in analysing the trial data. We then show that any statistician who ignores the randomization probabilities is unable to construct nominal 95 per cent confidence intervals for the true treatment effect that have both: (i) an expected length which goes to zero with increasing sample size; and (ii) a guaranteed expected actual coverage rate of at least 95 per cent over the ensemble of trials analysed by the statistician during his or her lifetime. However, we derive a new interval estimator, depending on the Randomization probabilities, that satisfies (i) and (ii).
我们认为,由于维度诅咒,在缺失数据取决于高维变量向量的设计研究中,任何基于纯似然或平滑似然的推断方法都存在重大困难。我们详细研究了连续分层随机抽样的半参数超总体版本。我们表明,无论收敛速度多慢,所有一致收敛或达到代数收敛速度的总体均值估计量都需要使用选择(随机化)概率。我们认为,与忽略这些概率的似然方法不同,逆选择概率加权估计量继续表现良好,实现了一致的(n^{1/2})收敛速度。我们提出了一种适用于维度诅咒(CODA)的渐近理论,用于非参数和半参数模型的推断,试图使我们的论点形式化。我们讨论了我们的结果是否对似然原则构成致命打击,并研究了坚定的主观贝叶斯主义者对这些结果的态度。最后,我们应用我们的CODA理论来分析几个有趣的半参数模型中“维度诅咒”的影响,包括一个双臂随机试验模型,其随机化概率取决于连续预处理协变量向量(X)。我们提供了实质性的背景,在这些背景下,主观贝叶斯主义者在分析试验数据时会忽略随机化概率。然后我们表明,任何忽略随机化概率的统计学家都无法为真实治疗效果构建名义上95%的置信区间,该区间既要满足:(i)预期长度随着样本量的增加趋于零;(ii)在统计学家一生分析的试验集合中保证预期实际覆盖率至少为95%。然而,我们推导了一种新的区间估计量,它取决于随机化概率,满足(i)和(ii)。