Clinical Epidemiology & Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Melbourne, Victoria 3052, Australia.
BMC Med Res Methodol. 2013 Nov 20;13:144. doi: 10.1186/1471-2288-13-144.
Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic.
Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios.
The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test.
Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.
多重插补(MI)作为处理缺失数据的策略越来越受欢迎,但用于检查插补模型充分性的工具却很少。柯尔莫哥洛夫-斯米尔诺夫(KS)检验已被确定为评估插补数据的分布是否与观测数据有显著差异的潜在诊断方法。本研究旨在评估 KS 检验作为插补诊断的性能。
我们通过模拟研究了 KS 检验是否可以可靠地识别插补模型中假设的偏差。为此,我们研究了当使用正态插补模型插补偏态和重尾数据时,KS 检验的 p 值如何变化。我们改变了缺失数据的数量、缺失数据模型以及偏度的大小,并在这些不同的场景下评估 KS 检验在诊断插补模型问题方面的性能。
KS 检验能够标记观察值和插补值之间的差异;然而,这些差异并不总是对应于感兴趣的回归参数的 MI 推断问题。当存在强烈的随机缺失依赖性时,无论 MI 估计值是否存在偏差,KS p 值都非常小;因此,KS 检验无法区分需要进一步调查的插补变量和不需要进一步调查的插补变量。p 值还对样本量和缺失数据的比例敏感,这增加了解释 KS 检验结果的难度。
根据我们的研究结果,很难确定将 KS 检验作为 MI 诊断工具使用的准则或建议。调查其他插补诊断方法及其纳入统计软件是未来研究的重要领域。