Suppr超能文献

基于可解释机器学习模型预测中国2型糖尿病患者发生瘦型非酒精性脂肪性肝病的风险

Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population.

作者信息

Bao Shixue, Jin Qiankai, Wang Tieqiao, Mao Yushan, Huang Guoqing

机构信息

Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, China.

Department of Endocrinology, Beilun People's Hospital, Ningbo, Zhejiang, China.

出版信息

Front Endocrinol (Lausanne). 2025 Jul 11;16:1626203. doi: 10.3389/fendo.2025.1626203. eCollection 2025.

Abstract

BACKGROUND

Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct interpretable machine learning models for predicting lean NAFLD risk in type 2 diabetes mellitus (T2DM) patients.

METHODS

This study enrolled 1,553 T2DM individuals who received health care at the First Affiliated Hospital of Ningbo University, Ningbo, China, from November 2019 to November 2024. Feature screening was performed using the Boruta algorithm and the Least Absolute Shrinkage and Selection Operator (LASSO). Linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost) were used in constructing risk prediction models for lean NAFLD in T2DM patients. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive capacity of the model. Additionally, we employed SHapley Additive exPlanations (SHAP) analysis to unveil the specific contributions of individual features in the machine learning model to the prediction results.

RESULTS

The prevalence of lean NAFLD in the study population was 20.3%. Eight variables, including age, body mass index (BMI), and alanine aminotransferase (ALT), were identified as independent risk factors for lean NAFLD. Ten predictive factors, including BMI, ALT, and aspartate aminotransferase (AST), were screened for the construction of risk prediction models. The random forest model demonstrated superior performance compared to alternative machine learning (ML) algorithms, achieving an AUC of 0.739 (95% confidence interval [CI]: 0.676-0.802) in the training set, and it also exhibited the best predictive value in the internal validation set with an AUC of 0.789 (95% CI: 0.722-0.856). In addition, the SHAP method identified TG, ALT, GGT, BMI, and UA as the top five variables influencing the predictions of the RF model.

CONCLUSION

The construction of lean NAFLD risk models based on the Chinese T2DM population, particularly the RF model, facilitates its early prevention and intervention, thereby reducing the risks of intrahepatic and extrahepatic adverse outcomes.

摘要

背景

非酒精性脂肪性肝病(NAFLD)是最常见的慢性肝病,严重威胁公众健康。尽管非肥胖型NAFLD患者的比例低于肥胖型NALFD患者,但不应被忽视。本研究旨在构建可解释的机器学习模型,用于预测2型糖尿病(T2DM)患者发生非肥胖型NAFLD的风险。

方法

本研究纳入了2019年11月至2024年11月在中国宁波大学附属第一医院接受医疗保健的1553例T2DM患者。使用Boruta算法和最小绝对收缩和选择算子(LASSO)进行特征筛选。采用线性判别分析(LDA)、逻辑回归(LR)、朴素贝叶斯(NB)、随机森林(RF)、支持向量机(SVM)和极端梯度提升(XGboost)构建T2DM患者非肥胖型NAFLD的风险预测模型。采用受试者操作特征曲线下面积(AUC)评估模型的预测能力。此外,我们采用SHapley加法解释(SHAP)分析来揭示机器学习模型中各个特征对预测结果的具体贡献。

结果

研究人群中非肥胖型NAFLD的患病率为20.3%。年龄、体重指数(BMI)和丙氨酸转氨酶(ALT)等8个变量被确定为非肥胖型NAFLD的独立危险因素。筛选出BMI、ALT和天冬氨酸转氨酶(AST)等10个预测因子用于构建风险预测模型。与其他机器学习(ML)算法相比,随机森林模型表现出更好的性能,在训练集中的AUC为0.739(95%置信区间[CI]:0.676-0.802),在内部验证集中也表现出最佳的预测价值,AUC为0.789(95%CI:0.722-0.856)。此外,SHAP方法确定甘油三酯(TG)、ALT、γ-谷氨酰转移酶(GGT)、BMI和尿酸(UA)是影响RF模型预测的前五个变量。

结论

基于中国T2DM人群构建非肥胖型NAFLD风险模型,尤其是RF模型,有助于其早期预防和干预,从而降低肝内和肝外不良结局的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e5/12289481/1cc4755d4c64/fendo-16-1626203-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验