White Ian R, Daniel Rhian, Royston Patrick
MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, Cambridge, UK.
Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, UK.
Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.
Multiple imputation is a popular way to handle missing data. Automated procedures are widely available in standard software. However, such automated procedures may hide many assumptions and possible difficulties from the view of the data analyst. Imputation procedures such as monotone imputation and imputation by chained equations often involve the fitting of a regression model for a categorical outcome. If perfect prediction occurs in such a model, then automated procedures may give severely biased results. This is a problem in some standard software, but it may be avoided by bootstrap methods, penalised regression methods, or a new augmentation procedure.
多重填补是处理缺失数据的一种常用方法。标准软件中广泛提供了自动化程序。然而,从数据分析人员的角度来看,此类自动化程序可能会掩盖许多假设和潜在困难。诸如单调填补和链式方程填补等填补程序通常涉及对分类结果拟合回归模型。如果在这样的模型中出现完美预测,那么自动化程序可能会给出严重有偏差的结果。这在一些标准软件中是个问题,但可以通过自助法、惩罚回归方法或一种新的扩充程序来避免。