Steele Russell J, Wang Naisyin, Raftery Adrian E
McGill University.
Stat Methodol. 2010 May 1;7(3):351-364. doi: 10.1016/j.stamet.2010.01.003.
We consider two difficulties with standard multiple imputation methods for missing data based on Rubin's t method for confidence intervals: their often excessive width, and their instability. These problems are present most often when the number of copies is small, as is often the case when a data collection organization is making multiple completed datasets available for analysis. We suggest using mixtures of normals as an alternative to Rubin's t. We also examine the performance of improper imputation methods as an alternative to generating copies from the true posterior distribution for the missing observations. We report the results of simulation studies and analyses of data on health-related quality of life in which the methods suggested here gave narrower confidence intervals and more stable inferences, especially with small numbers of copies or non-normal posterior distributions of parameter estimates. A free R software package called MImix that implements our methods is available from CRAN.
基于鲁宾t方法构建置信区间的标准多重填补方法处理缺失数据时,我们认为存在两个难点:区间宽度往往过大以及不稳定。当重复数据集数量较少时,这些问题最为常见,数据收集机构提供多个完整数据集以供分析时常常如此。我们建议使用正态混合模型替代鲁宾t方法。我们还研究了不恰当的填补方法,作为从缺失观测值的真实后验分布生成重复数据集的替代方法。我们报告了模拟研究的结果以及对健康相关生活质量数据的分析结果,其中这里提出的方法给出了更窄的置信区间和更稳定的推断,特别是在重复数据集数量较少或参数估计的后验分布非正态的情况下。可从CRAN获取一个名为MImix的免费R软件包,它实现了我们的方法。