Özdede Murat, Batur Ali, Aksoy Alp Eren
Department of Internal Medicine, Faculty of Medicine, Hacettepe University, Ankara, Türkiye.
Department of Emergency Medicine, Faculty of Medicine, Hacettepe University, Ankara, Türkiye.
Turk J Emerg Med. 2025 Jan 2;25(1):32-40. doi: 10.4103/tjem.tjem_161_24. eCollection 2025 Jan-Mar.
Traditional scoring systems have been widely used to predict acute pancreatitis (AP) severity but have limitations in predictive accuracy. This study investigates the use of machine learning (ML) algorithms to improve predictive accuracy in AP.
A retrospective study was conducted using data from 101 AP patients in a tertiary hospital in Türkiye. Data were preprocessed, and synthetic data were generated with Gaussian noise addition and balanced with the ADASYN algorithm, resulting in 250 cases. Supervised ML models, including random forest (RF) and XGBoost (XGB), were trained, tested, and validated against traditional clinical scores (Ranson's, modified Glasgow, and BISAP) using area under the curve (AUC), F1 score, and recall.
RF outperformed XGB with an AUC of 0.89, F1 score of 0.82, and recall of 0.82. BISAP showed balanced performance (AUC = 0.70, F1 = 0.44, and recall = 0.85), whereas the Glasgow criteria had the highest recall but lower precision (AUC = 0.70, F1 = 0.38, and recall = 0.95). Ranson's admission criteria were the least effective (AUC = 0.53, F1 = 0.42, and recall = 0.39), probable because it lacked the 48 h features.
ML models, especially RF, significantly outperform traditional clinical scores in predicting adverse outcomes in AP, suggesting that integrating ML into clinical practice could improve prognostic assessments.
传统评分系统已被广泛用于预测急性胰腺炎(AP)的严重程度,但在预测准确性方面存在局限性。本研究调查了使用机器学习(ML)算法来提高AP的预测准确性。
使用土耳其一家三级医院101例AP患者的数据进行回顾性研究。对数据进行预处理,并通过添加高斯噪声生成合成数据,并用ADASYN算法进行平衡,最终得到250个病例。使用曲线下面积(AUC)、F1分数和召回率,针对传统临床评分(兰森评分、改良格拉斯哥评分和BISAP评分)对包括随机森林(RF)和XGBoost(XGB)在内的监督式ML模型进行训练、测试和验证。
RF的表现优于XGB,其AUC为0.89,F1分数为0.82,召回率为0.82。BISAP表现较为均衡(AUC = 0.70,F1 = 0.44,召回率 = 0.85),而格拉斯哥标准的召回率最高但精度较低(AUC = 0.70,F1 = 0.38,召回率 = 0.95)。兰森入院标准效果最差(AUC = 0.53,F1 = 0.42,召回率 = 0.39),可能是因为它缺乏48小时特征。
ML模型,尤其是RF,在预测AP不良结局方面明显优于传统临床评分,这表明将ML纳入临床实践可以改善预后评估。