Long Qi, Hsu Chiu-Hsieh, Li Yisheng
Emory University.
Stat Sin. 2012;22:149-172.
Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.
缺失数据在医学和社会科学研究中很常见,并且在数据分析中常常构成严峻挑战。多重填补方法是处理缺失数据的常用且自然的工具,它用一组合理的值替代每个缺失值,这些值代表了潜在值的不确定性。我们考虑随机缺失(MAR)的情况,并研究当有一组完全观测到的协变量时,存在缺失值情况下结果变量边际均值的估计。我们提出一种新的非参数多重填补(MI)方法,该方法使用两个工作模型来实现降维,并为缺失观测定义填补集。与现有的非参数填补程序相比,我们的方法能够更好地处理高维协变量,并且具有双重稳健性,即如果两个工作模型中有一个被正确设定,所得估计量仍保持一致性。与现有的双重稳健方法相比,我们的非参数MI方法对两个工作模型的错误设定更具稳健性;它还避免了使用逆加权,因此对接近1的缺失概率不太敏感。我们提出一种敏感性分析来评估工作模型的有效性,使研究者能够选择最优权重,从而使所得估计量完全或更主要地依赖于可能被正确设定的工作模型,并提高效率。我们研究了所提出估计量的渐近性质,并进行模拟研究以表明所提方法在有限样本中与一些现有方法相比具有优势。使用来自一项结肠直肠腺瘤研究的数据进一步说明了所提方法。