Department of Surgery, Penn State College of Medicine, Hershey, Pennsylvania.
Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania.
Surg Obes Relat Dis. 2024 Nov;20(11):1056-1064. doi: 10.1016/j.soard.2024.08.008. Epub 2024 Aug 13.
Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).
To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.
Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.
We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.
A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.
The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.
预测代谢和减重手术(MBS)的并发症风险至关重要。
开发机器学习(ML)模型来预测 MBS 的严重术后并发症,并评估模型的种族公平性。
美国代谢和减重手术认证和质量改进计划(MBSAQIP)国家数据库。
我们使用 2016 年至 2020 年的 MBSAQIP 参与者使用数据文件开发了逻辑回归、随机森林(RF)、梯度提升树(GBT)和 XGBoost 模型。为了解决类别不平衡问题,我们随机对无并发症负类进行欠采样以匹配有并发症正类。使用接收者操作特征曲线下的面积(AUROC)、精度、召回率和 F1 分数评估模型性能。使用均等机会差异和差异影响指标评估白人和非白人患者组之间的公平性。
在对无并发症负类进行欠采样后,共有 40858 名患者被纳入。XGBoost 模型在 AUROC 方面表现最佳;然而,这一差异没有统计学意义。虽然 F1 分数和精度在模型之间没有显著差异,但 RF 与逻辑回归相比表现出更好的召回率。手术类型是预测并发症的最重要特征,其次是手术时间。逻辑回归模型在种族方面具有最佳的公平性指标。
XGBoost 模型获得了最高的 AUROC,尽管没有统计学上的显著差异。RF 在召回率是主要关注点时可能有用。对特权群体进行欠采样可能会提高提升树模型的公平性。