Kang Terri, Kraft Peter, Gauderman W James, Thomas Duncan
Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA.
BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S43. doi: 10.1186/1471-2156-4-S1-S43.
Missing data are a great concern in longitudinal studies, because few subjects will have complete data and missingness could be an indicator of an adverse outcome. Analyses that exclude potentially informative observations due to missing data can be inefficient or biased. To assess the extent of these problems in the context of genetic analyses, we compared case-wise deletion to two multiple imputation methods available in the popular SAS package, the propensity score and regression methods. For both the real and simulated data sets, the propensity score and regression methods produced results similar to case-wise deletion. However, for the simulated data, the estimates of heritability for case-wise deletion and the two multiple imputation methods were much lower than for the complete data. This suggests that if missingness patterns are correlated within families, then imputation methods that do not allow this correlation can yield biased results.
在纵向研究中,缺失数据是一个令人十分担忧的问题,因为很少有受试者会拥有完整的数据,而且数据缺失可能是不良结局的一个指标。由于缺失数据而排除潜在有用观测值的分析可能效率低下或存在偏差。为了在基因分析背景下评估这些问题的严重程度,我们将逐例删除法与流行的SAS软件包中可用的两种多重填补方法(倾向得分法和回归法)进行了比较。对于真实数据集和模拟数据集,倾向得分法和回归法产生的结果与逐例删除法相似。然而,对于模拟数据,逐例删除法和两种多重填补方法的遗传力估计值远低于完整数据的估计值。这表明,如果家庭内部的数据缺失模式存在相关性,那么不考虑这种相关性的填补方法可能会产生有偏差的结果。