Cooper Jennifer N, Wei Lai, Fernandez Soledad A, Minneci Peter C, Deans Katherine J
Center for Surgical Outcomes Research, The Research Institute at Nationwide Childrens Hospital, 700 Childrens Dr., Columbus, OH 43205, USA.
Center for Biostatistics, The Ohio State University, 2012 Kenny Road, Columbus, OH 43221, USA.
Comput Biol Med. 2015 Feb;57:54-65. doi: 10.1016/j.compbiomed.2014.11.009. Epub 2014 Dec 8.
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children.
We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity.
The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination.
Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks.
准确预测手术风险对患者和医生都很重要。逻辑回归(LR)模型通常用于估计这些风险。然而,在数据挖掘和机器学习领域,已经开发了许多替代的分类和预测算法。本研究旨在比较LR与几种数据挖掘算法在预测儿童30天手术发病率方面的性能。
我们使用2012年国家外科质量改进计划 - 儿科数据集来比较以下几种方法的性能:(1)假设线性和可加性的LR模型(简单LR模型);(2)纳入受限立方样条和交互作用的LR模型(灵活LR模型);(3)支持向量机;(4)随机森林;(5)用于预测手术发病率的增强分类树。
基于集成的方法在准确性、敏感性、特异性、阳性预测值和阴性预测值方面均显著高于简单LR模型。然而,就上述指标以及模型校准或鉴别而言,没有一个模型的表现优于灵活LR模型。
在预测儿科手术发病率方面,支持向量机、随机森林和增强分类树的表现并不优于LR。经过进一步验证后,本研究中得出的灵活LR模型可用于协助基于患者特定手术风险的临床决策。