Knuiman M W, Vu H T, Segal M R
Department of Public Health, University of Western Australia, Nedlands, Australia.
J Cardiovasc Risk. 1997 Apr;4(2):127-34.
Logistic regression and, more recently, Cox regression have been the predominant methods for identifying risk factors and developing risk estimation equations for coronary heart disease (CHD). Software for the regression tree method is now available for binary and survival outcomes and thus offers an alternative methodology. This paper compares these four methods for identifying significant risk factors from among a set of candidate factors and for estimating the risk of death from CHD using baseline and mortality follow-up data on 1,701 men participating in the Busselton Health Study. The candidate risk factors were age, body mass index, systolic and diastolic blood pressure, treatment for hypertension, cholesterol and smoking.
Logistic regression, Cox proportional hazards regression, binary regression tree, and survival regression tree analyses have been applied to data obtained from the same cohort of men for CHD death risk estimation and prediction. The four methods are compared in terms of the variables selected, goodness-of-fit of models, similarity of cross-validated estimated risks for individuals, and ability to discriminate between those who died from CHD and those who did not die from CHD during the follow-up period, including the comparison of Receiver Operating Characteristic (ROC) curves.
Although age and a blood pressure variable were selected by all four methods, body mass index was also selected by the regression tree methods and smoking was also selected by Cox regression. There was good, but not excellent, agreement between methods in estimates of risk for individuals, the areas under the ROC curves were 0.66 for the binary tree, 0.72 for logistic regression, 0.71 for the survival tree method and 0.78 for Cox regression. The average differences in estimated risk between those who died from CHD and those who did not die from CHD during the follow-up period were 0.051 for logistic regression, 0.070 for the binary tree method, 0.073 for the survival tree method and 0.088 for Cox regression.
For a moderately sized cohort typical of many applications of these methods in the literature, the two methods which used the survival outcome performed better than the methods using a binary outcome. Despite selecting some different variables and showing moderate differences in risk estimates for individuals, the two binary approaches were similar in performance. Cox regression appeared to be superior to the survival tree method, but further larger studies of completely separate samples for model development and evaluation of prediction performance are required to confirm this finding.
逻辑回归以及最近的Cox回归一直是识别冠心病(CHD)风险因素和制定风险估计方程的主要方法。现在有适用于二元和生存结局的回归树方法软件,因此提供了一种替代方法。本文比较了这四种方法,它们用于从一组候选因素中识别显著风险因素,并利用参与巴瑟尔顿健康研究的1701名男性的基线数据和死亡率随访数据来估计冠心病死亡风险。候选风险因素包括年龄、体重指数、收缩压和舒张压、高血压治疗情况、胆固醇和吸烟情况。
逻辑回归、Cox比例风险回归、二元回归树和生存回归树分析已应用于从同一组男性中获得的数据,以进行冠心病死亡风险估计和预测。从所选变量、模型拟合优度、个体交叉验证估计风险的相似性以及区分随访期间死于冠心病和未死于冠心病者的能力等方面对这四种方法进行比较,包括比较受试者工作特征(ROC)曲线。
尽管所有四种方法都选择了年龄和一个血压变量,但回归树方法还选择了体重指数,Cox回归还选择了吸烟情况。各方法在个体风险估计方面的一致性良好但并非极佳,二元树的ROC曲线下面积为0.66,逻辑回归为0.72,生存树方法为0.71,Cox回归为0.78。随访期间死于冠心病和未死于冠心病者估计风险的平均差异,逻辑回归为0.051,二元树方法为0.070,生存树方法为0.073,Cox回归为0.088。
对于文献中这些方法的许多应用所典型的中等规模队列,使用生存结局的两种方法比使用二元结局的方法表现更好。尽管选择了一些不同的变量,且在个体风险估计上存在适度差异,但两种二元方法的性能相似。Cox回归似乎优于生存树方法,但需要对完全独立的样本进行进一步更大规模的研究以进行模型开发和预测性能评估,以证实这一发现。