Suppr超能文献

在多变量诊断研究中,缺失值插补优于完全病例分析和缺失指标法:一个临床实例。

Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example.

作者信息

van der Heijden Geert J M G, Donders A Rogier T, Stijnen Theo, Moons Karel G M

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center, P.O. Box 80035, 3508 GA Utrecht, The Netherlands.

出版信息

J Clin Epidemiol. 2006 Oct;59(10):1102-9. doi: 10.1016/j.jclinepi.2006.01.015. Epub 2006 Jul 11.

Abstract

BACKGROUND AND OBJECTIVES

To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence.

METHODS

We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis.

RESULTS

The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation.

CONCLUSION

In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.

摘要

背景与目的

在多变量诊断研究中,旨在识别独立有助于预测疾病存在与否的潜在预测因素(检测结果),阐述处理缺失数据的不同方法——完整病例分析、缺失指标法、无条件和有条件均值的单一插补以及多重插补(MI)的效果。

方法

我们使用了来自一项关于肺栓塞诊断的前瞻性研究中398名受试者的数据。各种诊断预测因素或检测存在(不同百分比的)缺失值。对于处理这些缺失值的每种方法,我们使用多变量逻辑回归分析拟合了一个诊断预测模型。

结果

所有诊断模型的受试者工作特征曲线面积均高于0.75。基于完整病例分析以及使用缺失指标法后最终模型中的预测因素,与其他模型相比差异很大。基于MI的模型与使用单一条件和无条件均值插补后得出的模型差异不大。

结论

在多变量诊断研究中,即使数据是完全随机缺失的,也应避免完整病例分析和使用缺失指标法。已知MI方法优于单一插补方法。对于我们的示例研究,单一插补方法表现同样良好,但这很可能是因为总体缺失值数量较少。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验