Liu Lihe, Lin Jiaxi, Liu Lu, Gao Jingwen, Xu Guoting, Yin Minyue, Liu Xiaolin, Wu Airong, Zhu Jinzhou
Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China.
Department of Gastroenterology, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
Digit Health. 2024 Aug 7;10:20552076241272535. doi: 10.1177/20552076241272535. eCollection 2024 Jan-Dec.
Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.
All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.
A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.
We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.
非酒精性脂肪性肝病(NAFLD)被认为是全球最常见的慢性肝病之一。本研究旨在使用基于人群的横断面数据库评估自动机器学习(AutoML)在识别NAFLD中的有效性。
所有数据,包括实验室检查、人体测量和人口统计学变量,均来自国家健康和营养检查调查(NHANES)。NAFLD通过肝脏瞬时超声弹性成像中的受控衰减参数(CAP)来定义。采用最小绝对收缩和选择算子(LASSO)回归分析进行特征选择。在H2O自动机器学习平台上使用了六种算法:梯度提升机(GBM)、分布式随机森林(DRF)、极端随机树(XRT)、广义线性模型(GLM)、极端梯度提升(XGBoost)和深度学习(DL)。选择这些算法是因为它们具有多种优势,包括处理复杂非线性关系的能力、提供高预测准确性以及确保可解释性。通过受试者工作特征曲线下面积(AUC)评估模型,并通过校准曲线、决策曲线分析、变量重要性图、SHapley加性解释图、部分依赖图和局部可解释模型无关解释图进行解释。
共纳入4177名参与者(非NAFLD 3167例与NAFLD 1010例)来开发和验证AutoML模型。XGBoost开发的模型在AutoML中表现优于其他模型,在验证集上的AUC为0.859,准确率为0.795,灵敏度为0.773,特异性为0.802。
我们开发了一个XGBoost模型以更好地评估NAFLD的存在。基于XGBoost模型,我们创建了一个名为Shiny NAFLD(http://39.101.122.171:3838/App2/)的基于R Shiny的网络应用程序。该应用程序展示了AutoML在临床研究和实践中的潜力,为现实世界中NAFLD的识别提供了一个有前景的工具。