通过诊断模型预防疾病并发症：如何解决数据缺失问题？

Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data?

作者信息

Baneshi Mr, Faramarzi H, Marzban M

机构信息

Reserch Center for Modeling in Health, Kerman University of Medical Sciences, Kerman, Iran.

出版信息

Iran J Public Health. 2012;41(1):66-72. Epub 2012 Jan 31.

PMID:23113124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3481660/

Abstract

BACKGROUND

Diagnostic models are frequently used to assess the role of risk factors on disease complications, and therefore to avoid them. Missing data is an issue that challenges the model making. The aim of this study was to develop a diagnostic model to predict death in HIV/AIDS patients when missing data exist.

METHODS

HIV patients (n=1460) referred to Voluntary Consoling and Testing Center (VCT) of Shiraz southern Iran during 2004-2009 were recruited. Univariate association between variables and death was assessed. Only variables which had univariate P< 0.25 were selected to be offered to the Multifactorial models. First, patients with missing data on candidate variables were deleted (C-C model). Then, applying Multivariable Imputation via Chained Equations (MICE), missing data were imputed. Logistic regression was fitted to C-C and imputed data sets (MICE model). Models were compared in terms of number of variables retained in the final model, width of confidence intervals, and discrimination ability.

RESULT

About 22% of data were lost in C-C model. Number of variables retained in the C-C and MICE models was 2 and 6 respectively. Confidence Intervals (C.I.) corresponding to C-C model was wider than that of MICE. The MICE model showed greater discrimination ability than C-C model (70% versus 64%).

CONCLUSION

The C-C analysis resulted to loss of power and wide CI's. Once missing data were imputed, more variables reached significance level and C.I.'s were narrower. Therefore, we do recommend the application of the imputation method for handling missing data.

摘要

背景

诊断模型常用于评估风险因素在疾病并发症中的作用，从而避免并发症的发生。缺失数据是一个对模型构建构成挑战的问题。本研究的目的是在存在缺失数据的情况下，开发一种诊断模型来预测艾滋病毒/艾滋病患者的死亡情况。

方法

招募了2004年至2009年期间转诊至伊朗南部设拉子自愿咨询和检测中心（VCT）的艾滋病毒患者（n = 1460）。评估变量与死亡之间的单变量关联。仅选择单变量P < 0.25的变量纳入多因素模型。首先，删除候选变量存在缺失数据的患者（C-C模型）。然后，应用链式方程多重插补法（MICE）对缺失数据进行插补。对C-C数据集和插补后的数据集（MICE模型）进行逻辑回归分析。比较模型在最终模型中保留的变量数量、置信区间宽度和辨别能力方面的差异。

结果

C-C模型中约22%的数据缺失。C-C模型和MICE模型中保留的变量数量分别为2个和6个。C-C模型对应的置信区间（C.I.）比MICE模型的宽。MICE模型的辨别能力高于C-C模型（70%对64%）。

结论

C-C分析导致效能降低和置信区间变宽。一旦对缺失数据进行插补，更多变量达到显著水平，且置信区间变窄。因此，我们确实建议应用插补方法来处理缺失数据。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过诊断模型预防疾病并发症：如何解决数据缺失问题？

Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data?

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULT

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

通过诊断模型预防疾病并发症：如何解决数据缺失问题？

Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data?

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULT

CONCLUSION

背景

方法

结果

结论