Huang Tao, Li Yuanyuan, Wang Simin, Qiao Shijie, Zheng Xiujuan, Xiong Wenhui, Yang Menghan, Huang Xirui, Gao Bizhen
College of Integrative Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China.
College of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China.
Ann Med. 2025 Dec;57(1):2519679. doi: 10.1080/07853890.2025.2519679. Epub 2025 Jun 22.
Genome-wide association studies have provided profound insights into the genetic aetiology of metabolic syndrome (MetS). However, there is a lack of machine-learning (ML)-based predictive models to assess individual genetic susceptibility to MetS. This study utilized single-nucleotide polymorphisms (SNPs) as variables and employed ML-based genetic risk score (GRS) models to predict the occurrence of MetS, bringing it closer to clinical application.
Feature selection was performed using Least Absolute Shrinkage and Selection Operator. Six ML algorithms were employed to construct GRS models. A fivefold cross-validation was utilized to aid in the internal validation of models. The receiver operating characteristic (ROC) curve was used to select the better-performing GRS model. The SHapley Additive exPlanations (SHAP) was then applied to interpret the model. After extracting GRS, stratified analysis of BMI, age and gender was performed. Finally, these conventional risk factors and GRS were integrated through multivariate logistic regression to establish a combined model.
A total of 17 SNPs were selected for analysis. Among the GRS models, the extreme gradient boosting (XGBoost) model demonstrated superior discriminative performance (AUC = 0.837). The XGBoost's optimal robustness was also validated through five-fold cross-validation (mean ROC-AUC = 0.706). The XGBoost-based SHAP algorithm not only elucidated the global effects of 17 SNPs across all samples, but also described the interaction between SNPs, providing a visual representation of how SNPs impact the prediction of MetS in an individual. There was a strong correlation between GRS and MetS risk, particularly observed among young individuals, males and overweight individuals. Furthermore, the model combining conventional risk factors and GRS exhibited excellent discriminative performance (AUC = 0.962) and outstanding robustness (mean ROC-AUC = 0.959).
This study established a reliable XGBoost-based GRS model and a GRS prediction platform (https://metabolicsyndromeapps.shinyapps.io/geneticriskscore/) to assess individual genetic susceptibility to MetS. This model has high interpretability and can provide personalized reference for determining the necessity of primary prevention measures for MetS. Additionally, there may be interactions between traditional risk factors and GRS, and the integration of both in a comprehensive model is useful in the prediction of MetS occurrence.
全基因组关联研究为代谢综合征(MetS)的遗传病因提供了深刻见解。然而,缺乏基于机器学习(ML)的预测模型来评估个体对MetS的遗传易感性。本研究将单核苷酸多态性(SNP)用作变量,并采用基于ML的遗传风险评分(GRS)模型来预测MetS的发生,使其更接近临床应用。
使用最小绝对收缩和选择算子进行特征选择。采用六种ML算法构建GRS模型。利用五折交叉验证辅助模型的内部验证。使用受试者工作特征(ROC)曲线选择性能更好的GRS模型。然后应用SHapley加性解释(SHAP)来解释模型。提取GRS后,对BMI、年龄和性别进行分层分析。最后,通过多变量逻辑回归将这些传统风险因素和GRS整合,建立一个联合模型。
共选择了17个SNP进行分析。在GRS模型中,极端梯度提升(XGBoost)模型表现出卓越的判别性能(AUC = 0.837)。通过五折交叉验证也验证了XGBoost的最佳稳健性(平均ROC-AUC = 0.706)。基于XGBoost的SHAP算法不仅阐明了17个SNP在所有样本中的全局效应,还描述了SNP之间的相互作用,直观呈现了SNP如何影响个体对MetS的预测。GRS与MetS风险之间存在强相关性,在年轻人、男性和超重个体中尤为明显。此外,结合传统风险因素和GRS的模型表现出卓越的判别性能(AUC = 0.962)和出色的稳健性(平均ROC-AUC = 0.959)。
本研究建立了一个可靠的基于XGBoost的GRS模型和一个GRS预测平台(https://metabolicsyndromeapps.shinyapps.io/geneticriskscore/)来评估个体对MetS的遗传易感性。该模型具有高可解释性,可为确定MetS一级预防措施的必要性提供个性化参考。此外,传统风险因素与GRS之间可能存在相互作用,将两者整合到一个综合模型中有助于预测MetS的发生。