Baneshi M R, Talei A R
Department of Biostatistics and Epidemiology, Kerman University of Medical Sciences, Kerman, Iran.
Iran Red Crescent Med J. 2011 Aug;13(8):544-9. Epub 2011 Aug 1.
Missing data is a common problem in cancer research. While simple methods such as completecase (C-C) analysis are commonly employed for handling this problem, several studies have shown that these methods led to biased estimates. We aim to address the methodological issues in development of a prognostic model with missing data.
Three hundred and ten breast cancer patients were enrolled. At first, patients with missing data on any of four candidate variables were omitted. Secondly, missing data were imputed 10 times. Cox regression model was fitted to the C-C and imputed data. Results were compared in terms of variables retained in the model, discrimination ability, and goodness of fit.
Some variables lost their effect in complete-case analysis, due to loss in power, but reached significance level after imputation of missing data. Discrimination ability and goodness of fit of imputed data sets model was higher than that of complete-case model (C-index 76% versus 72%; Likelihood Ratio Test 51.19 versus 32.44).
Our findings showed inappropriateness of ad hoc complete-case analysis. This approach led to loss in power and imprecise estimates. Application of multiple imputation techniques to avid such problems is recommended.
缺失数据是癌症研究中的常见问题。虽然诸如完全病例(C-C)分析等简单方法通常用于处理此问题,但多项研究表明这些方法会导致估计有偏差。我们旨在解决在开发包含缺失数据的预后模型时的方法学问题。
招募了310名乳腺癌患者。首先,省略在四个候选变量中任何一个变量上有缺失数据的患者。其次,对缺失数据进行10次插补。将Cox回归模型应用于完全病例数据和插补后的数据。在模型中保留的变量、区分能力和拟合优度方面对结果进行比较。
由于效能损失,一些变量在完全病例分析中失去了其效应,但在缺失数据插补后达到了显著水平。插补数据集模型的区分能力和拟合优度高于完全病例模型(C指数分别为76%对72%;似然比检验分别为51.19对32.44)。
我们的研究结果表明临时进行完全病例分析是不合适的。这种方法导致效能损失和估计不准确。建议应用多重插补技术来避免此类问题。