Schafer J L, Olsen M K
Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.
Analyses of multivariate data are frequently hampered by missing values. Until recently, the only missing-data methods available to most data analysts have been relatively ad1 hoc practices such as listwise deletion. Recent dramatic advances in theoretical and computational statistics, however, have produced anew generation of flexible procedures with a sound statistical basis. These procedures involve multiple imputation (Rubin, 1987), a simulation technique that replaces each missing datum with a set of m > 1 plausible values. The rn versions of the complete data are analyzed by standard complete-data methods, and the results are combined using simple rules to yield estimates, standard errors, and p-values that formally incorporate missing-data uncertainty. New computational algorithms and software described in a recent book (Schafer, 1997a) allow us to create proper multiple imputations in complex multivariate settings. This article reviews the key ideas of multiple imputation, discusses the software programs currently available, and demonstrates their use on data from the Adolescent Alcohol Prevention Trial (Hansen & Graham, 199 I).
多变量数据的分析常常因缺失值而受阻。直到最近,大多数数据分析人员可用的唯一缺失数据方法一直是相对临时的做法,例如逐行删除。然而,理论和计算统计学最近的重大进展产生了新一代具有坚实统计基础的灵活程序。这些程序涉及多重填补(鲁宾,1987年),这是一种模拟技术,用一组m > 1个似然值替换每个缺失数据。通过标准的完整数据方法分析完整数据的m个版本,并使用简单规则合并结果,以产生正式纳入缺失数据不确定性的估计值、标准误差和p值。最近一本书(谢弗,1997a)中描述的新计算算法和软件使我们能够在复杂的多变量环境中创建适当的多重填补。本文回顾了多重填补的关键思想,讨论了目前可用的软件程序,并展示了它们在青少年酒精预防试验(汉森和格雷厄姆,1991年)数据中的应用。