Schellingerhout Jasper M, Heymans Martijn W, de Vet Henrica C W, Koes Bart W, Verhagen Arianne P
Department of General Practice, Erasmus Medical Centre, Rotterdam, The Netherlands.
J Clin Epidemiol. 2009 Aug;62(8):868-74. doi: 10.1016/j.jclinepi.2008.10.010. Epub 2009 Feb 20.
To evaluate whether different categorization strategies for introducing continuous variables in multivariable logistic regression analysis results in prognostic models that differ in content and performance.
Backward multivariable logistic regression (P<0.05 and P<0.157) was performed with possible predictors for persistent complaints in patients with nonspecific neck pain. The continuous variables were introduced in the analysis in three separate ways: (1) continuous, (2) split into multiple categories, and (3) dichotomized. The different models were compared with regard to model content, goodness of fit, explained variation, and discriminative ability. We also compared the effect on performance of categorization before and after the selection procedure.
For P<0.05, the final model with continuous variables, containing five predictors, disagreed on three predictors with both categorization strategies. For P<0.157, the model with continuous variables, containing six predictors, disagreed on three predictors with the model containing stratified continuous variables and on six predictors compared with the model with dichotomized variables. The models in which the variables were kept continuous performed best. There was no clear difference in performance between categorization before and after the selection procedure.
Categorization of continuous variables resulted in a different content and poorer performance of the final model.
评估在多变量逻辑回归分析中引入连续变量的不同分类策略是否会导致预后模型在内容和性能上存在差异。
对非特异性颈部疼痛患者持续性疼痛的可能预测因素进行向后多变量逻辑回归分析(P<0.05和P<0.157)。连续变量通过三种不同方式引入分析:(1)连续型,(2)拆分为多个类别,(3)二分法。比较不同模型在模型内容、拟合优度、解释变异和判别能力方面的差异。我们还比较了选择程序前后分类对性能的影响。
对于P<0.05,包含五个预测因素的连续变量最终模型在三个预测因素上与两种分类策略均不一致。对于P<0.157,包含六个预测因素的连续变量模型在三个预测因素上与包含分层连续变量的模型不一致,与二分变量模型相比在六个预测因素上不一致。变量保持连续的模型表现最佳。选择程序前后分类在性能上没有明显差异。
连续变量的分类导致最终模型的内容不同且性能较差。