机器学习模型在预测非活动性慢性乙型肝炎患者非酒精性脂肪性肝病中的应用：一项横断面分析。

Application of Machine Learning Models in Predicting Non-Alcoholic Fatty Liver Disease Among Inactive Chronic Hepatitis B Patients: A Cross-Sectional Analysis.

作者信息

Al-Alawi Abdullah M, Al-Balushi Amna S, Al-Shuaili Halima H, Mahmood Dalia A, Al-Busafi Said A

机构信息

Department of Medicine, Sultan Qaboos University Hospital, Muscat 123, Oman.

Internal Medicine Program, Oman Medical Specialty Board, Muscat 130, Oman.

出版信息

J Clin Med. 2025 Jul 16;14(14):5042. doi: 10.3390/jcm14145042.

DOI:10.3390/jcm14145042

PMID:40725732

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12295028/

Abstract

: Non-alcoholic fatty liver disease (NAFLD) represents significant health challenges, especially among patients with chronic hepatitis B (CHB). This study uses machine learning models to predict NAFLD in patients with inactive CHB. It builds on previous research by employing classification algorithms to analyze demographic, clinical, and laboratory data to identify NAFLD predictors. : A single-center cross-sectional study was conducted, including 450 inactive CHB patients from Sultan Qaboos University Hospital. Five ML models were developed: Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). : The prevalence of NAFLD was 50.22%. Among the machine learning models, Random Forest achieved the highest performance with an ROC AUC of 0.983 (95% CI: 0.952-0.999), followed by XGBoost at 0.977 (95% CI: 0.938-0.999) and MLP at 0.963 (95% CI: 0.915-0.995). SVM also showed strong performance with an AUC of 0.949 (95% CI: 0.897-0.985), while Logistic Regression demonstrated comparatively lower discrimination with an AUC of 0.886 (95% CI: 0.799-0.952). Key predictive features identified included platelet count, low-density lipoprotein (LDL), hemoglobin, and alanine aminotransferase (ALT). Logistic Regression highlighted platelet count as the most significant negative predictor, while LDL and ALT were positive contributors. : This study shows the utility of ML in improving the identification and management of NAFLD in CHB patients, enabling targeted interventions. Future research should expand on these findings, integrating genetic and lifestyle factors to enhance predictive accuracy across diverse populations.

摘要

非酒精性脂肪性肝病（NAFLD）带来了重大的健康挑战，尤其是在慢性乙型肝炎（CHB）患者中。本研究使用机器学习模型来预测非活动性CHB患者的NAFLD。它在前人研究的基础上，采用分类算法分析人口统计学、临床和实验室数据，以识别NAFLD的预测因素。

开展了一项单中心横断面研究，纳入了来自苏丹卡布斯大学医院的450例非活动性CHB患者。开发了五个机器学习模型：逻辑回归、随机森林、极端梯度提升（XGBoost）、支持向量机（SVM）和多层感知器（MLP）。

NAFLD的患病率为50.22%。在机器学习模型中，随机森林的表现最佳，ROC曲线下面积（AUC）为0.983（95%置信区间：0.952 - 0.999），其次是XGBoost，AUC为0.977（95%置信区间：0.938 - 0.999），MLP的AUC为0.963（95%置信区间：0.915 - 0.995）。SVM的表现也很强，AUC为0.949（95%置信区间：0.897 - 0.985），而逻辑回归的辨别力相对较低，AUC为0.886（95%置信区间：0.799 - 0.952）。确定的关键预测特征包括血小板计数、低密度脂蛋白（LDL）、血红蛋白和丙氨酸氨基转移酶（ALT）。逻辑回归突出血小板计数是最显著的负向预测因素，而LDL和ALT是正向因素。

本研究表明机器学习在改善CHB患者NAFLD的识别和管理方面具有实用性，能够实现有针对性的干预。未来的研究应扩展这些发现，整合遗传和生活方式因素，以提高不同人群的预测准确性。