Nguyen Cattram D, Lee Katherine J, Carlin John B
Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.
Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.
Biom J. 2015 Jul;57(4):676-94. doi: 10.1002/bimj.201400034. Epub 2015 May 3.
Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution.
多重填补作为处理缺失数据的一种策略正日益受到欢迎,但用于检查填补模型的工具却很匮乏,而这是模型拟合中的关键步骤。后验预测检验(PPC)已被推荐作为一种填补诊断方法。PPC涉及从受审查模型的后验预测分布中模拟“复制”数据。通过检查来自观测数据的分析是否看起来是该模型产生的复制结果中的典型结果来评估模型拟合情况。一种提议的诊断度量是后验预测“p值”,其极端值(即接近0或1的值)表明模型与数据不匹配。本研究的目的是评估后验预测p值作为填补诊断方法的性能。我们使用模拟方法故意错误设定填补模型,以确定后验预测p值在识别这些问题方面是否有效。在估计感兴趣的回归参数时,我们发现p值越极端,填补模型的性能就越差,尽管结果表明经典p值的传统阈值在此背景下并不适用。PPC方法的一个缺点是,随着缺失数据量的增加,其检测错误设定模型的能力会下降。尽管后验预测p值存在局限性,但它们在填补工具集中似乎占有重要地位。除了使用p值进行自动检查外,我们建议填补者进行图形检查并检查测试量分布的其他汇总情况。