机器学习模型在非酒精性脂肪性肝病中的开发和验证。

Development and validation of machine learning models for nonalcoholic fatty liver disease.

机构信息

Graduate School of Beijing University of Chinese Medicine, Beijing 100029, China; Department of Gastroenterology, China-Japan Friendship Hospital, Beijing 100029, China.

Phase 1 Clinical Trial Center, Deyang People's Hospital, Deyang 618000, China.

出版信息

Hepatobiliary Pancreat Dis Int. 2023 Dec;22(6):615-621. doi: 10.1016/j.hbpd.2023.03.009. Epub 2023 Mar 25.

DOI:10.1016/j.hbpd.2023.03.009

PMID:37005147

Abstract

BACKGROUND

Nonalcoholic fatty liver disease (NAFLD) had become the most prevalent liver disease worldwide. Early diagnosis could effectively reduce NAFLD-related morbidity and mortality. This study aimed to combine the risk factors to develop and validate a novel model for predicting NAFLD.

METHODS

We enrolled 578 participants completing abdominal ultrasound into the training set. The least absolute shrinkage and selection operator (LASSO) regression combined with random forest (RF) was conducted to screen significant predictors for NAFLD risk. Five machine learning models including logistic regression (LR), RF, extreme gradient boosting (XGBoost), gradient boosting machine (GBM), and support vector machine (SVM) were developed. To further improve model performance, we conducted hyperparameter tuning with train function in Python package 'sklearn'. We included 131 participants completing magnetic resonance imaging into the testing set for external validation.

RESULTS

There were 329 participants with NAFLD and 249 without in the training set, while 96 with NAFLD and 35 without were in the testing set. Visceral adiposity index, abdominal circumference, body mass index, alanine aminotransferase (ALT), ALT/AST (aspartate aminotransferase), age, high-density lipoprotein cholesterol (HDL-C) and elevated triglyceride (TG) were important predictors for NAFLD risk. The area under curve (AUC) of LR, RF, XGBoost, GBM, SVM were 0.915 [95% confidence interval (CI): 0.886-0.937], 0.907 (95% CI: 0.856-0.938), 0.928 (95% CI: 0.873-0.944), 0.924 (95% CI: 0.875-0.939), and 0.900 (95% CI: 0.883-0.913), respectively. XGBoost model presented the best predictive performance, and its AUC was enhanced to 0.938 (95% CI: 0.870-0.950) with further parameter tuning.

CONCLUSIONS

This study developed and validated five novel machine learning models for NAFLD prediction, among which XGBoost presented the best performance and was considered a reliable reference for early identification of high-risk patients with NAFLD in clinical practice.

摘要

背景

非酒精性脂肪性肝病（NAFLD）已成为全球最常见的肝脏疾病。早期诊断可有效降低 NAFLD 相关发病率和死亡率。本研究旨在结合危险因素，建立并验证一种预测 NAFLD 的新型模型。

方法

我们纳入了 578 名完成腹部超声检查的参与者作为训练集。采用最小绝对收缩和选择算子（LASSO）回归结合随机森林（RF）筛选出与 NAFLD 风险相关的显著预测因子。建立了包括逻辑回归（LR）、RF、极端梯度提升（XGBoost）、梯度提升机（GBM）和支持向量机（SVM）在内的 5 种机器学习模型。为了进一步提高模型性能，我们使用 Python 包'sklearn'中的 train 函数进行了超参数调优。我们纳入了 131 名完成磁共振成像检查的参与者作为测试集进行外部验证。

结果

训练集中，329 名参与者患有 NAFLD，249 名参与者未患有 NAFLD；测试集中，96 名参与者患有 NAFLD，35 名参与者未患有 NAFLD。内脏脂肪指数、腰围、体重指数、丙氨酸氨基转移酶（ALT）、ALT/AST（天冬氨酸氨基转移酶）、年龄、高密度脂蛋白胆固醇（HDL-C）和升高的甘油三酯（TG）是预测 NAFLD 风险的重要预测因子。LR、RF、XGBoost、GBM、SVM 的曲线下面积（AUC）分别为 0.915（95%置信区间[CI]：0.886-0.937）、0.907（95% CI：0.856-0.938）、0.928（95% CI：0.873-0.944）、0.924（95% CI：0.875-0.939）和 0.900（95% CI：0.883-0.913）。XGBoost 模型具有最佳的预测性能，进一步的参数调优后 AUC 提高至 0.938（95% CI：0.870-0.950）。

结论

本研究建立并验证了 5 种用于预测 NAFLD 的新型机器学习模型，其中 XGBoost 表现最佳，可作为临床实践中早期识别 NAFLD 高危患者的可靠参考。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

机器学习模型在非酒精性脂肪性肝病中的开发和验证。

Development and validation of machine learning models for nonalcoholic fatty liver disease.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

机器学习模型在非酒精性脂肪性肝病中的开发和验证。

Development and validation of machine learning models for nonalcoholic fatty liver disease.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献