Hong Yan, Chen Xinrong, Wang Ling, Zhang Fan, Zeng ZiYing, Xie Weining
Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, China.
First Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, China.
Front Nutr. 2025 Jun 30;12:1616229. doi: 10.3389/fnut.2025.1616229. eCollection 2025.
Metabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.
Data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.
Among the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.
This study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.
代谢功能障碍相关脂肪性肝病(MAFLD)是一种普遍且渐进性的肝脏疾病,与肥胖和代谢失调密切相关。传统的人体测量指标,如体重指数(BMI),在反映脂肪分布及相关风险方面能力有限。本研究旨在开发并验证使用详细身体成分指标预测MAFLD的机器学习(ML)模型,并通过可解释的ML技术探索脂肪组织特征的相对贡献。
使用2017 - 2018年国家健康与营养检查调查(NHANES)的数据,基于人体测量、人口统计学、生活方式和临床变量构建预测模型。实施了六种ML算法:决策树(DT)、支持向量机(SVM)、广义线性模型(GLM)、梯度提升机(GBM)、随机森林(RF)和XGBoost。使用Boruta算法进行特征选择,并使用交叉验证和验证集评估模型性能。采用SHapley加性解释(SHAP)来解释特征贡献。
在六种模型中,GBM算法表现最佳,在训练集上的受试者工作特征曲线下面积(AUC)值为0.875,在验证集上为0.879,敏感性和特异性波动最小。SHAP分析确定内脏脂肪组织(VAT)、BMI和皮下脂肪组织(SAT)为最具影响力 的预测因素。VAT的SHAP值最高,突出了其在MAFLD发病机制中的核心作用。
本研究证明了将身体成分特征与机器学习技术相结合用于MAFLD风险预测的有效性。GBM模型具有强大的预测准确性和可解释性,在临床决策和公共卫生筛查策略中具有潜在应用价值。SHAP分析为肥胖测量指标的相对重要性提供了有意义的见解,强化了脂肪分布指标超越传统肥胖指数的价值。