Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China.
Dig Dis Sci. 2023 Jul;68(7):2866-2877. doi: 10.1007/s10620-023-07949-7. Epub 2023 May 9.
Recurrence of common bile duct stones (CBDs) commonly happens after endoscopic retrograde cholangiopancreatography (ERCP). The clinical prediction models for the recurrence of CBDs after ERCP are lacking.
We aim to develop high-performance prediction models for the recurrence of CBDS after ERCP treatment using automated machine learning (AutoML) and to assess the AutoML models versus the traditional regression models.
473 patients with CBDs undergoing ERCP were recruited in the single-center retrospective cohort study. Samples were divided into Training Set (65%) and Validation Set (35%) randomly. Three modeling approaches, including fully automated machine learning (Fully automated), semi-automated machine learning (Semi-automated), and traditional regression were applied to fit prediction models. Models' discrimination, calibration, and clinical benefits were examined. The Shapley additive explanations (SHAP), partial dependence plot (PDP), and SHAP local explanation (SHAPLE) were proposed for the interpretation of the best model.
The area under roc curve (AUROC) of semi-automated gradient boost machine (GBM) model was 0.749 in Validation Set, better than the other fully/semi-automated models and the traditional regression models (highest AUROC = 0.736). The calibration and clinical application of AutoML models were adequate. Through the SHAP-PDP-SHAPLE pipeline, the roles of key variables of the semi-automated GBM model were visualized. Lastly, the best model was deployed online for clinical practitioners.
The GBM model based on semi-AutoML is an optimal model to predict the recurrence of CBDs after ERCP treatment. In comparison with traditional regressions, AutoML algorithms present significant strengths in modeling, which show promise in future clinical practices.
经内镜逆行胰胆管造影术(ERCP)后,胆总管结石(CBD)复发较为常见。目前缺乏预测 ERCP 后 CBD 复发的临床预测模型。
我们旨在使用自动化机器学习(AutoML)为 ERCP 治疗后 CBD 复发开发高性能预测模型,并评估 AutoML 模型与传统回归模型。
这项单中心回顾性队列研究纳入了 473 例接受 ERCP 治疗的 CBD 患者。样本被随机分为训练集(65%)和验证集(35%)。应用三种建模方法,包括全自动机器学习(Fully automated)、半自动机器学习(Semi-automated)和传统回归,拟合预测模型。检查了模型的区分度、校准度和临床获益。提出了 Shapley 加性解释(SHAP)、偏依赖图(PDP)和 SHAP 局部解释(SHAPLE)用于解释最佳模型。
验证集中半自动梯度提升机(GBM)模型的 AUC 为 0.749,优于其他全自动/半自动模型和传统回归模型(最高 AUC=0.736)。AutoML 模型的校准和临床应用是充分的。通过 SHAP-PDP-SHAPLE 管道,可视化了半自动 GBM 模型的关键变量的作用。最后,最佳模型已在线部署,供临床医生使用。
基于半自动 AutoML 的 GBM 模型是预测 ERCP 治疗后 CBD 复发的最佳模型。与传统回归相比,AutoML 算法在建模方面具有显著优势,有望在未来的临床实践中得到应用。