Suppr超能文献

医学研究中缺失的协变量数据:填补优于忽略。

Missing covariate data in medical research: to impute is better than to ignore.

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.

出版信息

J Clin Epidemiol. 2010 Jul;63(7):721-7. doi: 10.1016/j.jclinepi.2009.12.008. Epub 2010 Mar 24.

Abstract

OBJECTIVE

We compared popular methods to handle missing data with multiple imputation (a more sophisticated method that preserves data).

STUDY DESIGN AND SETTING

We used data of 804 patients with a suspicion of deep venous thrombosis (DVT). We studied three covariates to predict the presence of DVT: d-dimer level, difference in calf circumference, and history of leg trauma. We introduced missing values (missing at random) ranging from 10% to 90%. The risk of DVT was modeled with logistic regression for the three methods, that is, complete case analysis, exclusion of d-dimer level from the model, and multiple imputation.

RESULTS

Multiple imputation showed less bias in the regression coefficients of the three variables and more accurate coverage of the corresponding 90% confidence intervals than complete case analysis and dropping d-dimer level from the analysis. Multiple imputation showed unbiased estimates of the area under the receiver operating characteristic curve (0.88) compared with complete case analysis (0.77) and when the variable with missing values was dropped (0.65).

CONCLUSION

As this study shows that simple methods to deal with missing data can lead to seriously misleading results, we advise to consider multiple imputation. The purpose of multiple imputation is not to create data, but to prevent the exclusion of observed data.

摘要

目的

我们比较了缺失数据的常用处理方法与多重插补(一种更复杂的保留数据的方法)。

研究设计和设置

我们使用了 804 例疑似深静脉血栓(DVT)患者的数据。我们研究了三个预测 DVT 存在的协变量:D-二聚体水平、小腿周径差异和腿部创伤史。我们引入了从 10%到 90%不等的缺失值(随机缺失)。对于三种方法,即完整病例分析、从模型中排除 D-二聚体水平和多重插补,我们使用逻辑回归对 DVT 风险进行建模。

结果

与完整病例分析和从分析中排除 D-二聚体水平相比,多重插补显示出三个变量的回归系数的偏差更小,相应的 90%置信区间的覆盖更准确。多重插补显示出与完整病例分析(0.77)相比,接受者操作特征曲线(ROC)下面积(0.88)的无偏估计值,并且当有缺失值的变量被排除时(0.65)。

结论

正如本研究所示,处理缺失数据的简单方法可能导致严重误导的结果,因此我们建议考虑多重插补。多重插补的目的不是创建数据,而是防止排除观测数据。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验