Dabbah Shoham, Mishani Itamar, Davidov Yana, Ben Ari Ziv
Liver Diseases Center, Sheba Medical Center, Ramat Gan, Israel.
Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
Digestion. 2025;106(3):189-202. doi: 10.1159/000542241. Epub 2024 Oct 25.
This study aimed to train machine learning algorithms (MLAs) to detect advanced fibrosis (AF) in metabolic dysfunction-associated steatotic liver disease (MASLD) patients at the level of primary care setting and to explain the predictions to ensure responsible use by clinicians.
Readily available features of 618 MASLD patients followed up at a tertiary center were used to train five MLAs. AF was defined as liver stiffness ≥9.3 kPa, measured via 2-dimension shear wave elastography (n = 495) or liver biopsy ≥F3 (n = 123). MLAs were compared to Fibrosis-4 index (FIB-4) and non-alcoholic fatty liver disease (NAFLD) fibrosis score (NFS) on 540 MASLD patients from the primary care setting as validation. Feature importance, partial dependence, and shapely additive explanations (SHAPs) were utilized for explanation.
Extreme gradient boosting (XGBoost) achieved an AUC = 0.91, outperforming FIB-4 (AUC = 0.78) and NFS (AUC = 0.81, both p < 0.05) with specificity = 76% versus 59% and 48% for FIB-4 ≥1.3 and NFS ≥-1.45, respectively (p < 0.05). Its sensitivity (91%) was superior to FIB-4 (79%). XGBoost confidently excluded AF (negative predictive value = 99%) with the highest positive predictive value (31%), superior to FIB-4 and NFS (all p < 0.05). The most important features were HbA1c and gamma glutamyl transpeptidase (GGT) with a steep increase in AF probability at HbA1c >6.5%. The strongest interaction was between AST and age. XGBoost, but not logistic regression, extracted informative patterns from ALT, low-density lipoprotein cholesterol, and alkaline phosphatase (p < 0.001). One-quarter of the false positives (FPs) were correctly reclassified with only one additional false negative based on the SHAP values of GGT, platelets, and ALT which were found to be associated with a FP classification.
An explainable XGBoost algorithm was demonstrated superior to FIB-4 and NFS for screening of AF in MASLD patients at the primary care setting. The algorithm also proved safe for use as clinicians can understand the predictions and flag FP classifications.
本研究旨在训练机器学习算法(MLA),以便在初级保健环境中检测代谢功能障碍相关脂肪性肝病(MASLD)患者的晚期肝纤维化(AF),并对预测结果作出解释,以确保临床医生合理使用。
利用在一家三级中心接受随访的618例MASLD患者的现成特征来训练五种MLA。AF定义为通过二维剪切波弹性成像测量的肝脏硬度≥9.3 kPa(n = 495)或肝活检≥F3(n = 123)。将MLA与来自初级保健机构的540例MASLD患者的纤维化-4指数(FIB-4)和非酒精性脂肪性肝病(NAFLD)纤维化评分(NFS)进行比较,作为验证。利用特征重要性、部分依赖和SHapley加性解释(SHAP)进行解释。
极端梯度提升(XGBoost)的AUC = 0.91,优于FIB-4(AUC = 0.78)和NFS(AUC = 0.81,两者p < 0.05),特异性分别为76%,而FIB-4≥1.3和NFS≥-1.45时分别为59%和48%(p < 0.05)。其敏感性(91%)优于FIB-4(79%)。XGBoost能可靠地排除AF(阴性预测值 = 99%),阳性预测值最高(31%),优于FIB-4和NFS(所有p < 0.05)。最重要的特征是糖化血红蛋白(HbA1c)和γ-谷氨酰转肽酶(GGT),当HbA1c>6.5%时,AF发生概率急剧增加。最强的相互作用存在于谷草转氨酶(AST)和年龄之间。XGBoost而非逻辑回归从谷丙转氨酶(ALT)、低密度脂蛋白胆固醇和碱性磷酸酶中提取了信息模式(p < 0.001)。根据与假阳性分类相关的GGT、血小板和ALT的SHAP值,四分之一的假阳性(FP)被正确重新分类,仅增加了一例假阴性。
在初级保健环境中,一种可解释的XGBoost算法在筛查MASLD患者的AF方面优于FIB-4和NFS。该算法也被证明使用安全,因为临床医生可以理解预测结果并标记FP分类。