利用身体成分对美国成年人代谢功能障碍相关脂肪性肝病风险进行机器学习预测：基于SHapley加性解释的可解释性分析

Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations.

作者信息

Hong Yan, Chen Xinrong, Wang Ling, Zhang Fan, Zeng ZiYing, Xie Weining

机构信息

Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, China.

First Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, China.

出版信息

Front Nutr. 2025 Jun 30;12:1616229. doi: 10.3389/fnut.2025.1616229. eCollection 2025.

DOI:10.3389/fnut.2025.1616229

PMID:40661678

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12256230/

Abstract

BACKGROUND

Metabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.

METHODS

Data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.

RESULTS

Among the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.

CONCLUSION

This study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.

摘要

背景

代谢功能障碍相关脂肪性肝病（MAFLD）是一种普遍且渐进性的肝脏疾病，与肥胖和代谢失调密切相关。传统的人体测量指标，如体重指数（BMI），在反映脂肪分布及相关风险方面能力有限。本研究旨在开发并验证使用详细身体成分指标预测MAFLD的机器学习（ML）模型，并通过可解释的ML技术探索脂肪组织特征的相对贡献。

方法

使用2017 - 2018年国家健康与营养检查调查（NHANES）的数据，基于人体测量、人口统计学、生活方式和临床变量构建预测模型。实施了六种ML算法：决策树（DT）、支持向量机（SVM）、广义线性模型（GLM）、梯度提升机（GBM）、随机森林（RF）和XGBoost。使用Boruta算法进行特征选择，并使用交叉验证和验证集评估模型性能。采用SHapley加性解释（SHAP）来解释特征贡献。

结果

在六种模型中，GBM算法表现最佳，在训练集上的受试者工作特征曲线下面积（AUC）值为0.875，在验证集上为0.879，敏感性和特异性波动最小。SHAP分析确定内脏脂肪组织（VAT）、BMI和皮下脂肪组织（SAT）为最具影响力的预测因素。VAT的SHAP值最高，突出了其在MAFLD发病机制中的核心作用。