Department of Surgery, University of California, San Francisco, San Francisco, California, United States.
Department of General, HPB Surgery and Liver Transplantation, Ghent University Hospital, Ghent, Belgium.
J Gastrointest Surg. 2024 Jun;28(6):956-965. doi: 10.1016/j.gassur.2024.03.006. Epub 2024 Mar 12.
Machine learning (ML) approaches have become increasingly popular in predicting surgical outcomes. However, it is unknown whether they are superior to traditional statistical methods such as logistic regression (LR). This study aimed to perform a systematic review and meta-analysis to compare the performance of ML vs LR models in predicting postoperative outcomes for patients undergoing gastrointestinal (GI) surgery.
A systematic search of Embase, MEDLINE, Cochrane, Web of Science, and Google Scholar was performed through December 2022. The primary outcome was the discriminatory performance of ML vs LR models as measured by the area under the receiver operating characteristic curve (AUC). A meta-analysis was then performed using a random effects model.
A total of 62 LR models and 143 ML models were included across 38 studies. On average, the best-performing ML models had a significantly higher AUC than the LR models (ΔAUC, 0.07; 95% CI, 0.04-0.09; P < .001). Similarly, on average, the best-performing ML models had a significantly higher logit (AUC) than the LR models (Δlogit [AUC], 0.41; 95% CI, 0.23-0.58; P < .001). Approximately half of studies (44%) were found to have a low risk of bias. Upon a subset analysis of only low-risk studies, the difference in logit (AUC) remained significant (ML vs LR, Δlogit [AUC], 0.40; 95% CI, 0.14-0.66; P = .009).
We found a significant improvement in discriminatory ability when using ML over LR algorithms in predicting postoperative outcomes for patients undergoing GI surgery. Subsequent efforts should establish standardized protocols for both developing and reporting studies using ML models and explore the practical implementation of these models.
机器学习(ML)方法在预测手术结果方面变得越来越流行。然而,目前尚不清楚它们是否优于传统的统计方法,如逻辑回归(LR)。本研究旨在进行系统评价和荟萃分析,以比较 ML 与 LR 模型在预测胃肠道(GI)手术患者术后结局方面的性能。
通过 2022 年 12 月的 Embase、MEDLINE、Cochrane、Web of Science 和 Google Scholar 进行了系统搜索。主要结局是 ML 与 LR 模型的判别性能,通过接收者操作特征曲线下面积(AUC)来衡量。然后使用随机效应模型进行荟萃分析。
共有 38 项研究纳入了 62 个 LR 模型和 143 个 ML 模型。平均而言,性能最佳的 ML 模型的 AUC 明显高于 LR 模型(ΔAUC,0.07;95%CI,0.04-0.09;P<.001)。同样,平均而言,性能最佳的 ML 模型的对数(AUC)明显高于 LR 模型(Δlogit[AUC],0.41;95%CI,0.23-0.58;P<.001)。大约一半的研究(44%)被认为存在低偏倚风险。仅对低风险研究进行亚组分析时,logit(AUC)的差异仍然显著(ML 与 LR,Δlogit[AUC],0.40;95%CI,0.14-0.66;P=0.009)。
我们发现,在预测胃肠道手术患者术后结局时,使用 ML 算法比 LR 算法在判别能力方面有显著提高。未来的研究应建立使用 ML 模型开发和报告研究的标准化协议,并探索这些模型的实际应用。