Suppr超能文献

预后建模研究中缺失协变量数据处理技术的比较:一项模拟研究。

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

机构信息

Centre for Statistics in Medicine, University of Oxford, Oxford, UK.

出版信息

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

Abstract

BACKGROUND

There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.

METHODS

Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.

RESULTS

Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.

CONCLUSION

The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR.

摘要

背景

目前对于预后模型研究中缺失协变量数据的最佳处理方法尚无共识。因此,本研究进行了一项模拟研究,以评估不同缺失数据技术对预后模型性能的影响。

方法

为了模拟激发乳腺癌实例中偏态分布,生成了数据集。使用四种机制对四个协变量施加多元缺失数据:完全随机缺失(MCAR)、随机缺失(MAR)、非随机缺失(MNAR)和前三种机制的组合。考虑了 5%至 75%的五种不完全案例量。调查了 R 统计软件中五种缺失数据处理方法:a)假设多元正态分布的数据增强(DA)方法,b)假设广义位置模型的 DA,c)回归切换插补,d)回归切换与预测均值匹配(MICE-PMM),e)灵活的加性插补模型。拟合了 Cox 比例风险模型,并获得了回归系数和模型性能指标的适当估计值。

结果

进行 CC 分析会产生无偏的回归估计值,但会使标准误差膨胀,从而影响缺失率达到 25%或更高时模型中协变量的显著性。使用 SI 会低估变异性,即使缺失率为 10%,也会导致覆盖率较差。在 MI 方法中,应用 MICE-PMM 通常会产生对不完全协变量最小的偏倚估计值和更好的覆盖率,并对所有机制产生更好的模型性能。但是,当 50%或更多的病例存在 MCAR、MAR 或组合机制下施加的缺失数据时,这种 MI 方法仍会对缺失的偏态连续协变量产生有偏的回归系数估计值。当缺失取决于不完全协变量,即 MNAR 时,所有 MI 方法的不完全案例超过 10%时,估计值都会有偏。

结论

这项模拟研究的结果表明,在以下情况下,进行 MICE-PMM 可能是首选的 MI 方法:少于 50%的病例存在缺失数据,并且缺失数据不是 MNAR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7b4/2824146/f357cb58fa57/1471-2288-10-7-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验