Austin Peter C
Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada.
Stat Med. 2007 Jul 10;26(15):2937-57. doi: 10.1002/sim.2770.
Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.
临床医生和卫生服务研究人员常常对预测特定患者发生不良事件的概率(如死亡、疾病复发、术后并发症、再次入院)感兴趣。在临床研究中,使用分类回归树(CART)来预测结果的兴趣日益浓厚。我们比较了逻辑回归和回归树在预测急性心肌梗死(AMI)住院后死亡率方面的预测准确性。我们还研究了另外两种数据驱动模型的预测能力:广义相加模型(GAMs)和多元自适应回归样条(MARS)。我们使用了安大略省9484例因AMI入院患者的数据。我们采用重复分割样本验证:将数据随机分为推导样本和验证样本。使用推导样本估计预测模型,并使用验证样本中受试者工作特征(ROC)曲线下的面积评估所得模型的预测准确性。这个过程重复了1000次——初始数据集被随机分为推导样本和验证样本1000次,每次都评估每种方法的预测准确性。在1000个推导样本中,回归树模型的平均ROC曲线面积为0.762,而简单逻辑回归模型的平均ROC曲线面积为0.845。其他方法的平均ROC曲线面积范围从低至0.831到高至0.851。我们的研究表明,在预测AMI后的死亡率方面,回归树的表现不如逻辑回归。然而,逻辑回归模型的性能与更灵活的数据驱动模型(如GAMs和MARS)相当。