Costanza Michael C, Paccaud Fred
Division of Clinical Epidemiology, Geneva University Hospitals, Geneva, Switzerland.
BMC Med Res Methodol. 2004 Apr 6;4:7. doi: 10.1186/1471-2288-4-7.
We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements.
Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region.
Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models.
There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
我们试图改进先前发表的统计建模策略,以便基于腰臀围比和体重指数人体测量数据,对一般人群进行血脂异常的二元分类筛查。
研究对象为在瑞士两个地区进行的基于世界卫生组织MONICA项目的人群调查参与者。结局变量基于总血清胆固醇与高密度脂蛋白胆固醇的比值。其他潜在预测变量为性别、年龄、当前吸烟状况和高血压。所研究的模型有:(i)线性回归;(ii)逻辑分类;(iii)回归树;(iv)分类树(iii和iv统称为“CART”)。通过对来自另一个地区的受试者进行分类,对特定地区模型的二元分类性能进行外部验证。
腰臀围比和体重指数仍然是血脂异常的适度预测指标。所有模型的正确分类率为60%-80%,存在明显的性别差异。特定性别的模型在分类方面仅有小幅提升。外部验证为模型的稳定性提供了保证。
代数方法(i、ii)与非代数方法(iii、iv),或回归方法(i、iii)与分类方法(ii、iv)之间均无显著差异。在这一具有相对较少预测变量的特定应用中,CART相对于简单加法线性和逻辑模型的预期优势小于预期。在考虑大量预测变量的主效应和相互作用时,CART模型可能更有用。