预后建模研究中缺失协变量数据处理技术的比较：一项模拟研究。

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

机构信息

Centre for Statistics in Medicine, University of Oxford, Oxford, UK.

出版信息

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

DOI:10.1186/1471-2288-10-7

PMID:20085642

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2824146/

Abstract

BACKGROUND

There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.

METHODS

Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.

RESULTS

Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.

CONCLUSION

The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR.

摘要

背景

目前对于预后模型研究中缺失协变量数据的最佳处理方法尚无共识。因此，本研究进行了一项模拟研究，以评估不同缺失数据技术对预后模型性能的影响。

方法

为了模拟激发乳腺癌实例中偏态分布，生成了数据集。使用四种机制对四个协变量施加多元缺失数据：完全随机缺失（MCAR）、随机缺失（MAR）、非随机缺失（MNAR）和前三种机制的组合。考虑了 5%至 75%的五种不完全案例量。调查了 R 统计软件中五种缺失数据处理方法：a）假设多元正态分布的数据增强（DA）方法，b）假设广义位置模型的 DA，c）回归切换插补，d）回归切换与预测均值匹配（MICE-PMM），e）灵活的加性插补模型。拟合了 Cox 比例风险模型，并获得了回归系数和模型性能指标的适当估计值。

结果

进行 CC 分析会产生无偏的回归估计值，但会使标准误差膨胀，从而影响缺失率达到 25%或更高时模型中协变量的显著性。使用 SI 会低估变异性，即使缺失率为 10%，也会导致覆盖率较差。在 MI 方法中，应用 MICE-PMM 通常会产生对不完全协变量最小的偏倚估计值和更好的覆盖率，并对所有机制产生更好的模型性能。但是，当 50%或更多的病例存在 MCAR、MAR 或组合机制下施加的缺失数据时，这种 MI 方法仍会对缺失的偏态连续协变量产生有偏的回归系数估计值。当缺失取决于不完全协变量，即 MNAR 时，所有 MI 方法的不完全案例超过 10%时，估计值都会有偏。

结论

这项模拟研究的结果表明，在以下情况下，进行 MICE-PMM 可能是首选的 MI 方法：少于 50%的病例存在缺失数据，并且缺失数据不是 MNAR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7b4/2824146/f357cb58fa57/1471-2288-10-7-1.jpg

相似文献

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较：一项模拟研究。

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.缺失协变量数据处理的填补方法在 Cox 比例风险模型拟合中的比较：重抽样研究。

BMC Med Res Methodol. 2010 Dec 31;10:112. doi: 10.1186/1471-2288-10-112.

Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.具有 MAR 敏感性分析的逻辑回归中缺失协变量数据的处理方法。

Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20.

Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer.比较乳腺癌生存分析中免疫组化标志物缺失数据处理方法。

Br J Cancer. 2011 Feb 15;104(4):693-9. doi: 10.1038/sj.bjc.6606078. Epub 2011 Jan 25.

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。

BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.

Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?在二元结局观察值缺失的情况下，对于估计随机对照试验中的患病率（风险）差异，使用多重填补法是否比完全病例分析法更好？

Trials. 2016 Jul 22;17:341. doi: 10.1186/s13063-016-1473-3.

Imputation of missing covariate in randomized controlled trials with a continuous outcome: Scoping review and new results.缺失协变量在随机对照试验中连续结果的推断：范围综述和新结果。

Pharm Stat. 2020 Nov;19(6):840-860. doi: 10.1002/pst.2041. Epub 2020 Jun 8.

A Bayesian Latent Variable Selection Model for Nonignorable Missingness.贝叶斯潜在变量选择模型在不可忽略缺失数据中的应用

Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.

Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.缺失协变量值的多重插补与完全案例分析相比的偏差和效率。

Stat Med. 2010 Dec 10;29(28):2920-31. doi: 10.1002/sim.3944.

Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。

BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.

引用本文的文献

Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models.临床风险预测模型中的缺失数据插补与内部验证相结合

Stat Med. 2025 Aug;44(18-19):e70203. doi: 10.1002/sim.70203.

Comprehensive reporting guidelines and checklist for studies developing and utilizing artificial intelligence models.开发和使用人工智能模型的研究的综合报告指南及清单

Korean J Anesthesiol. 2025 Jun;78(3):199-214. doi: 10.4097/kja.25075. Epub 2025 Mar 26.

Early-Life Factors and Body Mass Index Trajectories Among Children in the ECHO Cohort.ECHO队列中儿童的早期生活因素与体重指数轨迹

JAMA Netw Open. 2025 May 1;8(5):e2511835. doi: 10.1001/jamanetworkopen.2025.11835.

Comparison of different approaches in handling missing data in longitudinal multiple-item patient-reported outcomes: a simulation study.纵向多项患者报告结局中处理缺失数据的不同方法比较：一项模拟研究。

Health Qual Life Outcomes. 2025 Apr 5;23(1):34. doi: 10.1186/s12955-025-02364-0.

Perceived risk of type 2 diabetes: Using linked genomic, clinical and questionnaire data to understand the potential use of genetic risk tools in British South Asians.2型糖尿病的感知风险：利用关联的基因组、临床和问卷调查数据来了解基因风险工具在英国南亚人群中的潜在用途。

PLOS Glob Public Health. 2025 Mar 31;5(3):e0004274. doi: 10.1371/journal.pgph.0004274. eCollection 2025.

Risk and Resilience Trajectories from Adverse Childhood Experience Among Men Who Have Sex with Men Living with HIV.感染艾滋病毒的男男性行为者童年不良经历的风险与复原轨迹

Behav Med. 2025 Mar 25:1-12. doi: 10.1080/08964289.2025.2480562.

Risk prediction models for dental caries in children and adolescents: a systematic review and meta-analysis.儿童和青少年龋齿风险预测模型：系统评价与荟萃分析

BMJ Open. 2025 Mar 5;15(3):e088253. doi: 10.1136/bmjopen-2024-088253.

Exploring the role of psychological flexibility in relationship functioning among couples coping with prostate cancer: a cross-sectional study.探索心理灵活性在应对前列腺癌的夫妻关系功能中的作用：一项横断面研究。

Support Care Cancer. 2025 Feb 13;33(3):186. doi: 10.1007/s00520-025-09229-8.

ASA score is an independent predictor of 1-year outcome after moderate-to-severe traumatic brain injury.美国麻醉医师协会（ASA）评分是中重度创伤性脑损伤后1年预后的独立预测指标。

Scand J Trauma Resusc Emerg Med. 2025 Feb 6;33(1):25. doi: 10.1186/s13049-025-01338-x.

Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学：电子健康记录中缺失数据处理的系统评价

Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.

本文引用的文献

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines.结合多次插补后预后建模研究兴趣的估计：当前实践和指南。

BMC Med Res Methodol. 2009 Jul 28;9:57. doi: 10.1186/1471-2288-9-57.

Imputing missing covariate values for the Cox model.为Cox模型估算缺失的协变量值。

Stat Med. 2009 Jul 10;28(15):1982-98. doi: 10.1002/sim.3618.

How should variable selection be performed with multiply imputed data?对于多重填补的数据，应如何进行变量选择？

Stat Med. 2008 Jul 30;27(17):3227-46. doi: 10.1002/sim.3177.

The design of simulation studies in medical statistics.医学统计学中的模拟研究设计

Stat Med. 2006 Dec 30;25(24):4279-92. doi: 10.1002/sim.2673.

A comparison of imputation methods in a longitudinal randomized clinical trial.一项纵向随机临床试验中插补方法的比较。

Stat Med. 2005 Jul 30;24(14):2111-28. doi: 10.1002/sim.2099.

Generating survival times to simulate Cox proportional hazards models.生成生存时间以模拟Cox比例风险模型。

Stat Med. 2005 Jun 15;24(11):1713-23. doi: 10.1002/sim.2059.

Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies.实践中缺失值的插补：28项队列研究中血清胆固醇插补的结果。

Am J Epidemiol. 2004 Jul 1;160(1):34-45. doi: 10.1093/aje/kwh175.

Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines.癌症预后研究中协变量数据缺失情况：当前报告综述及拟议指南

Br J Cancer. 2004 Jul 5;91(1):4-8. doi: 10.1038/sj.bjc.6601907.

A new measure of prognostic separation in survival data.生存数据中预后分离的一种新度量方法。

Stat Med. 2004 Mar 15;23(5):723-48. doi: 10.1002/sim.1621.

Diagnostic research on routine care data: prospects and problems.常规护理数据的诊断研究：前景与问题

J Clin Epidemiol. 2003 Jun;56(6):501-6. doi: 10.1016/s0895-4356(03)00080-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

预后建模研究中缺失协变量数据处理技术的比较：一项模拟研究。

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献