Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, China.
Health Science Center, Ningbo University, Ningbo, China.
J Med Internet Res. 2023 Sep 12;25:e46891. doi: 10.2196/46891.
Nonalcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and targeting populations at a heightened risk of developing NAFLD over a 5-year period can help reduce and delay adverse hepatic prognostic events.
This study aimed to investigate the 5-year incidence of NAFLD in the Chinese population. It also aimed to establish and validate a machine learning model for predicting the 5-year NAFLD risk.
The study population was derived from a 5-year prospective cohort study. A total of 6196 individuals without NAFLD who underwent health checkups in 2010 at Zhenhai Lianhua Hospital in Ningbo, China, were enrolled in this study. Extreme gradient boosting (XGBoost)-recursive feature elimination, combined with the least absolute shrinkage and selection operator (LASSO), was used to screen for characteristic predictors. A total of 6 machine learning models, namely logistic regression, decision tree, support vector machine, random forest, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set, and a further evaluation of the model performance was carried out in the internal and external validation sets.
The 5-year incidence of NAFLD was 18.64% (n=1155) in the study population. We screened 11 predictors for risk prediction model construction. After the hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, with an area under the receiver operating characteristic (AUROC) curve of 0.810 (95% CI 0.768-0.852). Logistic regression showed the best prediction performance in the internal and external validation sets, with AUROC curves of 0.778 (95% CI 0.759-0.794) and 0.806 (95% CI 0.788-0.821), respectively. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model.
Developing and validating machine learning models can aid in predicting which populations are at the highest risk of developing NAFLD over a 5-year period, thereby helping delay and reduce the occurrence of adverse liver prognostic events.
非酒精性脂肪性肝病(NAFLD)已成为全球性的公共卫生问题。在 5 年内识别和针对发生 NAFLD 风险较高的人群,可以帮助减少和延迟不良肝脏预后事件的发生。
本研究旨在调查中国人群中 NAFLD 的 5 年发生率,并建立和验证用于预测 5 年 NAFLD 风险的机器学习模型。
研究人群来自一项为期 5 年的前瞻性队列研究。共纳入 2010 年在宁波镇海龙联医院进行健康检查且无 NAFLD 的 6196 名个体。采用极端梯度提升(XGBoost)-递归特征消除,结合最小绝对收缩和选择算子(LASSO),筛选特征预测因子。共构建了 6 种机器学习模型,包括逻辑回归、决策树、支持向量机、随机森林、分类提升和 XGBoost,用于构建 5 年 NAFLD 风险模型。在训练集中对预测模型的超参数进行优化,并在内部和外部验证集中进一步评估模型性能。
研究人群的 5 年 NAFLD 发生率为 18.64%(n=1155)。我们筛选出 11 个预测因素用于构建风险预测模型。经过超参数优化,CatBoost 在训练集中的预测性能最佳,受试者工作特征(ROC)曲线下面积(AUROC)为 0.810(95%CI 0.768-0.852)。逻辑回归在内部和外部验证集中的预测性能最佳,AUROC 曲线下面积分别为 0.778(95%CI 0.759-0.794)和 0.806(95%CI 0.788-0.821)。基于网络的计算器的开发提高了风险预测模型的临床可行性。
开发和验证机器学习模型可以帮助预测哪些人群在 5 年内发生 NAFLD 的风险最高,从而有助于延迟和减少不良肝脏预后事件的发生。