Yu Chenyan, Li Yao, Yin Minyue, Gao Jingwen, Xi Liting, Lin Jiaxi, Liu Lu, Zhang Huixian, Wu Airong, Xu Chunfang, Liu Xiaolin, Wang Yue, Zhu Jinzhou
Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China.
Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China.
J Pers Med. 2022 Nov 19;12(11):1930. doi: 10.3390/jpm12111930.
To evaluate the feasibility of automated machine learning (AutoML) in predicting 30-day mortality in non-cholestatic cirrhosis.
A total of 932 cirrhotic patients were included from the First Affiliated Hospital of Soochow University between 2014 and 2020. Participants were divided into training and validation datasets at a ratio of 8.5:1.5. Models were developed on the HO AutoML platform in the training dataset, and then were evaluated in the validation dataset by area under receiver operating characteristic curves (AUC). The best AutoML model was interpreted by SHapley Additive exPlanation (SHAP) Plot, Partial Dependence Plots (PDP), and Local Interpretable Model Agnostic Explanation (LIME).
The model, based on the extreme gradient boosting (XGBoost) algorithm, performed better (AUC 0.888) than the other AutoML models (logistic regression 0.673, gradient boost machine 0.886, random forest 0.866, deep learning 0.830, stacking 0.850), as well as the existing scorings (the model of end-stage liver disease [MELD] score 0.778, MELD-Na score 0.782, and albumin-bilirubin [ALBI] score 0.662). The most key variable in the XGBoost model was high-density lipoprotein cholesterol, followed by creatinine, white blood cell count, international normalized ratio, etc. Conclusion: The AutoML model based on the XGBoost algorithm presented better performance than the existing scoring systems for predicting 30-day mortality in patients with non-cholestatic cirrhosis. It shows the promise of AutoML in its future medical application.
评估自动机器学习(AutoML)预测非胆汁淤积性肝硬化患者30天死亡率的可行性。
2014年至2020年期间,苏州大学附属第一医院共纳入932例肝硬化患者。参与者按8.5:1.5的比例分为训练集和验证集。在训练集中的HO AutoML平台上开发模型,然后在验证集中通过受试者操作特征曲线下面积(AUC)进行评估。最佳的AutoML模型通过SHapley加性解释(SHAP)图、部分依赖图(PDP)和局部可解释模型无关解释(LIME)进行解释。
基于极端梯度提升(XGBoost)算法的模型表现优于其他AutoML模型(逻辑回归0.673、梯度提升机0.886、随机森林0.866、深度学习0.830、堆叠0.850)以及现有的评分系统(终末期肝病模型[MELD]评分0.778、MELD-Na评分0.782和白蛋白-胆红素[ALBI]评分0.662),其AUC为0.888。XGBoost模型中最关键的变量是高密度脂蛋白胆固醇,其次是肌酐、白细胞计数、国际标准化比值等。结论:基于XGBoost算法的AutoML模型在预测非胆汁淤积性肝硬化患者30天死亡率方面表现优于现有的评分系统。这显示了AutoML在未来医学应用中的前景。