Zhang Xinyu, Lin Sen, Zeng Qingling, Peng Lisheng, Yan Chaoguang
The Fourth Clinical Medical College of Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China.
School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong, China.
Front Nutr. 2025 Jul 16;12:1612369. doi: 10.3389/fnut.2025.1612369. eCollection 2025.
This study aims to develop and validate a machine learning model that integrates dietary antioxidants to predict cardiovascular disease (CVD) risk in diabetic patients. By analyzing the contributions of key antioxidants using SHAP values, the study offers evidence-based insights and dietary recommendations to improve cardiovascular health in diabetic individuals.
This study leveraged data from the U.S. National Health and Nutrition Examination Survey (NHANES) to develop predictive models incorporating antioxidant-related variables-including vitamins, minerals, and polyphenols-alongside demographic, lifestyle, and health status factors. Data preprocessing involved collinearity removal, standardization, and class imbalance correction. Multiple machine learning models were developed and evaluated using the mlr3 framework, with benchmark testing performed to compare predictive performance. Feature importance in the best-performing model was interpreted using SHapley Additive exPlanations (SHAP).
This study utilized data from 1,356 individuals with diabetes from NHANES, including 332 with comorbid CVD. After removing collinear variables, 27 dietary antioxidant features and 13 baseline covariates were retained. Among all models, XGBoost demonstrated the best predictive performance, with an accuracy of 87.4%, an error rate of 12.6%, and both AUC and PRC values of 0.949. SHAP analysis highlighted Daidzein, magnesium (Mg), epigallocatechin-3-gallate (EGCG), pelargonidin, vitamin A, and theaflavin 3'-gallate as the most influential predictors.
XGBoost exhibited the highest predictive performance for cardiovascular disease risk in diabetic patients. SHAP analysis underscored the prominent contribution of dietary antioxidants, with Daidzein and Mg emerging as the most influential predictors.
本研究旨在开发并验证一种整合膳食抗氧化剂的机器学习模型,以预测糖尿病患者的心血管疾病(CVD)风险。通过使用SHAP值分析关键抗氧化剂的贡献,本研究提供基于证据的见解和饮食建议,以改善糖尿病个体的心血管健康。
本研究利用美国国家健康与营养检查调查(NHANES)的数据来开发预测模型,该模型纳入了与抗氧化剂相关的变量,包括维生素、矿物质和多酚,以及人口统计学、生活方式和健康状况因素。数据预处理包括消除共线性、标准化和类别不平衡校正。使用mlr3框架开发并评估了多个机器学习模型,并进行基准测试以比较预测性能。使用SHapley加性解释(SHAP)来解释表现最佳的模型中的特征重要性。
本研究使用了NHANES中1356名糖尿病患者的数据,其中332名患有合并CVD。在去除共线变量后,保留了27个膳食抗氧化剂特征和13个基线协变量。在所有模型中,XGBoost表现出最佳的预测性能,准确率为87.4%,错误率为12.6%,AUC和PRC值均为0.949。SHAP分析突出显示大豆苷元、镁(Mg)、表没食子儿茶素-3-没食子酸酯(EGCG)、天竺葵色素、维生素A和茶黄素-3'-没食子酸酯是最具影响力的预测因子。
XGBoost在预测糖尿病患者心血管疾病风险方面表现出最高的性能。SHAP分析强调了膳食抗氧化剂的突出贡献,大豆苷元和镁成为最具影响力的预测因子。