Baneshi Mr, Faramarzi H, Marzban M
Reserch Center for Modeling in Health, Kerman University of Medical Sciences, Kerman, Iran.
Iran J Public Health. 2012;41(1):66-72. Epub 2012 Jan 31.
Diagnostic models are frequently used to assess the role of risk factors on disease complications, and therefore to avoid them. Missing data is an issue that challenges the model making. The aim of this study was to develop a diagnostic model to predict death in HIV/AIDS patients when missing data exist.
HIV patients (n=1460) referred to Voluntary Consoling and Testing Center (VCT) of Shiraz southern Iran during 2004-2009 were recruited. Univariate association between variables and death was assessed. Only variables which had univariate P< 0.25 were selected to be offered to the Multifactorial models. First, patients with missing data on candidate variables were deleted (C-C model). Then, applying Multivariable Imputation via Chained Equations (MICE), missing data were imputed. Logistic regression was fitted to C-C and imputed data sets (MICE model). Models were compared in terms of number of variables retained in the final model, width of confidence intervals, and discrimination ability.
About 22% of data were lost in C-C model. Number of variables retained in the C-C and MICE models was 2 and 6 respectively. Confidence Intervals (C.I.) corresponding to C-C model was wider than that of MICE. The MICE model showed greater discrimination ability than C-C model (70% versus 64%).
The C-C analysis resulted to loss of power and wide CI's. Once missing data were imputed, more variables reached significance level and C.I.'s were narrower. Therefore, we do recommend the application of the imputation method for handling missing data.
诊断模型常用于评估风险因素在疾病并发症中的作用,从而避免并发症的发生。缺失数据是一个对模型构建构成挑战的问题。本研究的目的是在存在缺失数据的情况下,开发一种诊断模型来预测艾滋病毒/艾滋病患者的死亡情况。
招募了2004年至2009年期间转诊至伊朗南部设拉子自愿咨询和检测中心(VCT)的艾滋病毒患者(n = 1460)。评估变量与死亡之间的单变量关联。仅选择单变量P < 0.25的变量纳入多因素模型。首先,删除候选变量存在缺失数据的患者(C-C模型)。然后,应用链式方程多重插补法(MICE)对缺失数据进行插补。对C-C数据集和插补后的数据集(MICE模型)进行逻辑回归分析。比较模型在最终模型中保留的变量数量、置信区间宽度和辨别能力方面的差异。
C-C模型中约22%的数据缺失。C-C模型和MICE模型中保留的变量数量分别为2个和6个。C-C模型对应的置信区间(C.I.)比MICE模型的宽。MICE模型的辨别能力高于C-C模型(70%对64%)。
C-C分析导致效能降低和置信区间变宽。一旦对缺失数据进行插补,更多变量达到显著水平,且置信区间变窄。因此,我们确实建议应用插补方法来处理缺失数据。