Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China.
Department of Gastroenterology, The Changshu No. 1 Hospital of Soochow University, Suzhou, China.
Front Cell Infect Microbiol. 2022 Jun 10;12:886935. doi: 10.3389/fcimb.2022.886935. eCollection 2022.
Machine learning (ML) algorithms are widely applied in building models of medicine due to their powerful studying and generalizing ability. This study aims to explore different ML models for early identification of severe acute pancreatitis (SAP) among patients hospitalized for acute pancreatitis.
This retrospective study enrolled patients with acute pancreatitis (AP) from multiple centers. Data from the First Affiliated Hospital and Changshu No. 1 Hospital of Soochow University were adopted for training and internal validation, and data from the Second Affiliated Hospital of Soochow University were adopted for external validation from January 2017 to December 2021. The diagnosis of AP and SAP was based on the 2012 revised Atlanta classification of acute pancreatitis. Models were built using traditional logistic regression (LR) and automated machine learning (AutoML) analysis with five types of algorithms. The performance of models was evaluated by the receiver operating characteristic (ROC) curve, the calibration curve, and the decision curve analysis (DCA) based on LR and feature importance, SHapley Additive exPlanation (SHAP) Plot, and Local Interpretable Model Agnostic Explanation (LIME) based on AutoML.
A total of 1,012 patients were included in this study to develop the AutoML models in the training/validation dataset. An independent dataset of 212 patients was used to test the models. The model developed by the gradient boost machine (GBM) outperformed other models with an area under the ROC curve (AUC) of 0.937 in the validation set and an AUC of 0.945 in the test set. Furthermore, the GBM model achieved the highest sensitivity value of 0.583 among these AutoML models. The model developed by eXtreme Gradient Boosting (XGBoost) achieved the highest specificity value of 0.980 and the highest accuracy of 0.958 in the test set.
The AutoML model based on the GBM algorithm for early prediction of SAP showed evident clinical practicability.
机器学习(ML)算法因其强大的学习和泛化能力,被广泛应用于构建医学模型。本研究旨在探索不同的 ML 模型,用于早期识别因急性胰腺炎(AP)住院的患者中发生重症急性胰腺炎(SAP)的风险。
本回顾性研究纳入了来自多个中心的 AP 患者。数据来自苏州大学第一附属医院和常熟市第一人民医院用于训练和内部验证,苏州大学第二附属医院的数据用于 2017 年 1 月至 2021 年 12 月的外部验证。AP 和 SAP 的诊断基于 2012 年修订的亚特兰大急性胰腺炎分类标准。模型使用传统逻辑回归(LR)和自动化机器学习(AutoML)分析,采用五种算法构建。模型的性能通过基于 LR 和特征重要性的接受者操作特征(ROC)曲线、校准曲线和决策曲线分析(DCA)、基于 AutoML 的 SHapley Additive exPlanation (SHAP) Plot 和 Local Interpretable Model Agnostic Explanation (LIME) 进行评估。
本研究共纳入 1012 例患者,在训练/验证数据集内开发 AutoML 模型。采用 212 例独立患者的数据集来测试模型。在验证集和测试集中,梯度提升机(GBM)模型的 AUC 值分别为 0.937 和 0.945,表现优于其他模型。此外,在这些 AutoML 模型中,GBM 模型的敏感性值最高,为 0.583。极端梯度提升(XGBoost)模型在测试集的特异性值最高,为 0.980,准确率最高,为 0.958。
基于 GBM 算法的 AutoML 模型在 SAP 的早期预测方面具有明显的临床实用性。