Paik Myunghee Cho, Wang Cuiling
Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168 Street, New York City, N.Y. 10032, U.S.A.
J Stat Plan Inference. 2009 Jul 1;139(7):2341-2350. doi: 10.1016/j.jspi.2008.10.024.
When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao and Lipsitz, 1992). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when when the missingness proportion is large.
当数据缺失时,仅分析完全观测到的记录可能会导致偏差或效率低下。处理缺失数据的现有方法包括似然法、插补法和逆概率加权法。在本文中,我们提出了三种估计方法,其灵感来源于在回归设置中删除一些完全观测到的数据。首先,我们生成与给定观测数据的结果无关的人工观测指标,并基于这些人工观测指标进行推断。其次,我们提出了一种密切相关的加权方法。所提出的加权方法比逆概率加权法(Zhao和Lipsitz,1992)的权重更稳定。第三,我们通过从估计函数中减去其在干扰切空间上的投影来提高所提出的加权估计量的效率。当数据完全随机缺失时,我们证明所提出的估计量的渐近方差小于或等于仅使用完全观测记录得到的估计量的方差。渐近相对效率计算和模拟研究表明,在所提出的加权估计量在广泛的实际情况下比逆概率加权估计量更有效,特别是当缺失比例较大时。