Vansteelandt Stijn, Rotnitzky Andrea, Robins James
Department of Applied Mathematics and Computer Sciences, Ghent University, 9000 Ghent, Belgium.
Department of Economics, Di Tella University, Buenos Aires, Argentina.
Biometrika. 2007 Dec;94(4):841-860. doi: 10.1093/biomet/asm070.
We propose a new class of models for making inference about the mean of a vector of repeated outcomes when the outcome vector is incompletely observed in some study units and missingness is nonmonotone. Each model in our class is indexed by a set of unidentified selection bias functions which quantify the residual association of the outcome at each occasion and the probability that this outcome is missing after adjusting for variables observed prior to time and for the past nonresponse pattern. In particular, selection bias functions equal to zero encode the investigator's a priori belief that nonresponse of the next outcome does not depend on that outcome after adjusting for the observed past. We call this assumption sequential explainability. Since each model in our class is nonparametric, it fits the data perfectly well. As such, our models are ideal for conducting sensitivity analyses aimed at evaluating the impact that different degrees of departure from sequential explainability have on inference about the marginal means of interest. Although the marginal means are identified under each of our models, their estimation is not feasible in practice because it requires the auxiliary estimation of conditional expectations and probabilities given high-dimensional variables. We henceforth discuss estimation of the marginal means under each model in our class assuming, additionally, that at each occasion either one of following two models holds: a parametric model for the conditional probability of nonresponse given current outcomes and past recorded data, or a parametric model for the conditional mean of the outcome on the nonrespondents given the past recorded data. We call the resulting procedure 2 -multiply robust as it protects at each of the time points against misspecification of one of these two working models, although not against simultaneous misspecification of both. We extend our proposed class of models and estimators to incorporate data configurations which include baseline covariates and a parametric model for the conditional mean of the vector of repeated outcomes given the baseline covariates.
我们提出了一类新的模型,用于在某些研究单元中结果向量未被完全观测且缺失是非单调的情况下,对重复结果向量的均值进行推断。我们这类模型中的每个模型都由一组未识别的选择偏差函数索引,这些函数量化了每次观测时结果的残余关联,以及在调整了时间之前观测到的变量和过去的无应答模式后该结果缺失的概率。特别地,等于零的选择偏差函数编码了研究者的先验信念,即在调整了观测到的过去情况后,下一个结果的无应答不依赖于该结果。我们称这个假设为顺序可解释性。由于我们这类模型中的每个模型都是非参数的,所以它能很好地拟合数据。因此,我们的模型非常适合进行敏感性分析,旨在评估偏离顺序可解释性的不同程度对感兴趣的边际均值推断的影响。尽管在我们的每个模型下边际均值都是可识别的,但在实践中它们的估计是不可行的,因为这需要对给定高维变量的条件期望和概率进行辅助估计。此后,我们将讨论在我们这类模型下边际均值的估计,另外假设在每次观测时以下两个模型之一成立:一个关于给定当前结果和过去记录数据的无应答条件概率的参数模型,或者一个关于给定过去记录数据的无应答者结果条件均值的参数模型。我们称由此产生的程序为2 -多重稳健,因为它在每个时间点都能防止这两个工作模型之一的错误设定,尽管不能防止两个模型同时错误设定。我们扩展了我们提出的模型和估计器类别,以纳入包括基线协变量和给定基线协变量的重复结果向量条件均值的参数模型的数据配置。