Suppr超能文献

通过诊断模型预防疾病并发症:如何解决数据缺失问题?

Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data?

作者信息

Baneshi Mr, Faramarzi H, Marzban M

机构信息

Reserch Center for Modeling in Health, Kerman University of Medical Sciences, Kerman, Iran.

出版信息

Iran J Public Health. 2012;41(1):66-72. Epub 2012 Jan 31.

Abstract

BACKGROUND

Diagnostic models are frequently used to assess the role of risk factors on disease complications, and therefore to avoid them. Missing data is an issue that challenges the model making. The aim of this study was to develop a diagnostic model to predict death in HIV/AIDS patients when missing data exist.

METHODS

HIV patients (n=1460) referred to Voluntary Consoling and Testing Center (VCT) of Shiraz southern Iran during 2004-2009 were recruited. Univariate association between variables and death was assessed. Only variables which had univariate P< 0.25 were selected to be offered to the Multifactorial models. First, patients with missing data on candidate variables were deleted (C-C model). Then, applying Multivariable Imputation via Chained Equations (MICE), missing data were imputed. Logistic regression was fitted to C-C and imputed data sets (MICE model). Models were compared in terms of number of variables retained in the final model, width of confidence intervals, and discrimination ability.

RESULT

About 22% of data were lost in C-C model. Number of variables retained in the C-C and MICE models was 2 and 6 respectively. Confidence Intervals (C.I.) corresponding to C-C model was wider than that of MICE. The MICE model showed greater discrimination ability than C-C model (70% versus 64%).

CONCLUSION

The C-C analysis resulted to loss of power and wide CI's. Once missing data were imputed, more variables reached significance level and C.I.'s were narrower. Therefore, we do recommend the application of the imputation method for handling missing data.

摘要

背景

诊断模型常用于评估风险因素在疾病并发症中的作用,从而避免并发症的发生。缺失数据是一个对模型构建构成挑战的问题。本研究的目的是在存在缺失数据的情况下,开发一种诊断模型来预测艾滋病毒/艾滋病患者的死亡情况。

方法

招募了2004年至2009年期间转诊至伊朗南部设拉子自愿咨询和检测中心(VCT)的艾滋病毒患者(n = 1460)。评估变量与死亡之间的单变量关联。仅选择单变量P < 0.25的变量纳入多因素模型。首先,删除候选变量存在缺失数据的患者(C-C模型)。然后,应用链式方程多重插补法(MICE)对缺失数据进行插补。对C-C数据集和插补后的数据集(MICE模型)进行逻辑回归分析。比较模型在最终模型中保留的变量数量、置信区间宽度和辨别能力方面的差异。

结果

C-C模型中约22%的数据缺失。C-C模型和MICE模型中保留的变量数量分别为2个和6个。C-C模型对应的置信区间(C.I.)比MICE模型的宽。MICE模型的辨别能力高于C-C模型(70%对64%)。

结论

C-C分析导致效能降低和置信区间变宽。一旦对缺失数据进行插补,更多变量达到显著水平,且置信区间变窄。因此,我们确实建议应用插补方法来处理缺失数据。

相似文献

本文引用的文献

2
The first postmodern pandemic: 25 years of HIV/ AIDS.第一场后现代大流行:25年的艾滋病毒/艾滋病历程。
J Intern Med. 2008 Mar;263(3):218-43. doi: 10.1111/j.1365-2796.2007.01910.x. Epub 2008 Jan 16.
4
Missing data.缺失数据。
BMJ. 2007 Feb 24;334(7590):424. doi: 10.1136/bmj.38977.682025.2C.
5
Using the outcome for imputation of missing predictor values was preferred.使用结果来插补缺失的预测变量值是更可取的。
J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.
6
Review: a gentle introduction to imputation of missing values.综述:缺失值插补的简要介绍
J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.
10
Patient reported outcomes as endpoints in medical research.患者报告的结局作为医学研究的终点。
Stat Methods Med Res. 2004 Apr;13(2):115-38. doi: 10.1191/0962280204sm357ra.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验