Wang Yi-Xuan, Kang Jin-Quan, Chen Zuo-Guan, Gao Shang, Zhao Wen-Xin, Zhao Ning, Lan Yong, Li Yong-Jun
Department of Vascular Surgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China; Peking University Fifth School of Clinical Medicine, Beijing, China.
Beijing Information Science & Technology University, Beijing, China.
Ann Vasc Surg. 2025 May;114:154-162. doi: 10.1016/j.avsg.2024.12.077. Epub 2025 Jan 30.
Peripheral arterial disease (PAD) is a common manifestation of atherosclerosis, affecting over 200 million people worldwide. The incidence of PAD is increasing due to the aging population. Common risk factors include smoking, diabetes, and hyperlipidemia, but its exact pathogenesis remains unclear. Nutritional intake is associated with the onset and progression of PAD, although relevant studies remain limited. Some studies suggest that certain nutritional elements may influence the development of PAD. This study aims to explore the relationship between nutrition and PAD using machine learning techniques. Unlike traditional statistical methods, machine learning can effectively capture complex, nonlinear relationships, providing a more comprehensive analysis of PAD risk factor.
Data from National Health and Nutrition Examination Survey (NHANES 1999-2004) were analyzed, including demographic, clinical, and dietary information. Nutrient intake was assessed through 24-h dietary recalls using computer-assisted dietary interview system (CADI) and automated multiple pass method (AMPM) methods. PAD was defined as an ankle-brachial index (ABI) < 0.9. Six ML models-extreme gradient boosting (XGBoost), random Forest (RF), naive bayes classifier (NB), support vector machine (SVM), logistic regression (LR), and decision tree (DT)-were trained on a 70/30 train-test split, with missing data imputed and sample imbalance addressed via synthetic minority oversampling technique (SMOTE). Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, precision, recall, and F1 score. Shapley additive explanations (SHAP) analysis was used to identify key features. In addition, to further enhance the interpretability of the model, we applied SHAP analysis to identify the features that have a significant impact on PAD prediction. This approach allowed us to determine the contribution of different variables to the model's output, providing deeper insights into how each feature influences the prediction of PAD outcomes.
Of 31,126 participants, 4,520 met the inclusion criteria (mean age 61.2 ± 13.5 years; 48.8% male), and 441 (9.7%) had ABI < 0.9. XGBoost outperformed other models, achieving an AUROC of 0.913 (95% CI, 0.891-0.936) and F1 score of 0.932. With SMOTE, its AUROC improved to 0.926 (95% CI, 0.889-0.936) and F1 score to 0.937. SHAP analysis identified vitamin C, saturated fatty acids, selenium, phosphorus, and protein intake as key predictors of PAD.
This is the first study to apply ML algorithms to examine nutrient intake and PAD in a general population. Vitamin C and phosphorus showed negative correlations with PAD, while saturated fatty acids, protein, and selenium exhibited positive associations.
外周动脉疾病(PAD)是动脉粥样硬化的常见表现,全球有超过2亿人受其影响。由于人口老龄化,PAD的发病率正在上升。常见的危险因素包括吸烟、糖尿病和高脂血症,但其确切发病机制仍不清楚。营养摄入与PAD的发生和发展有关,尽管相关研究仍然有限。一些研究表明,某些营养元素可能影响PAD的发展。本研究旨在使用机器学习技术探索营养与PAD之间的关系。与传统统计方法不同,机器学习可以有效地捕捉复杂的非线性关系,从而对PAD危险因素进行更全面的分析。
分析了来自国家健康与营养检查调查(NHANES 1999 - 2004)的数据,包括人口统计学、临床和饮食信息。通过使用计算机辅助饮食访谈系统(CADI)和自动多次通过法(AMPM)的24小时饮食回顾来评估营养摄入。PAD被定义为踝臂指数(ABI)<0.9。六个机器学习模型——极端梯度提升(XGBoost)、随机森林(RF)、朴素贝叶斯分类器(NB)、支持向量机(SVM)、逻辑回归(LR)和决策树(DT)——在70/30的训练 - 测试分割上进行训练,通过合成少数过采样技术(SMOTE)处理缺失数据和样本不平衡问题。使用受试者操作特征曲线下面积(AUROC)、准确性、敏感性、特异性、精确性、召回率和F1分数评估模型性能。使用Shapley加法解释(SHAP)分析来识别关键特征。此外,为了进一步提高模型的可解释性,我们应用SHAP分析来识别对PAD预测有重大影响的特征。这种方法使我们能够确定不同变量对模型输出的贡献,从而更深入地了解每个特征如何影响PAD结果的预测。
在31126名参与者中,4520名符合纳入标准(平均年龄61.2±13.5岁;48.8%为男性),441名(9.7%)ABI<0.9。XGBoost的表现优于其他模型,AUROC为0.913(95%CI,0.891 - 0.936),F1分数为0.932。使用SMOTE后,其AUROC提高到0.926(95%CI,0. eighty-eight nine - 0.936),F1分数提高到0.937。SHAP分析确定维生素C、饱和脂肪酸、硒、磷和蛋白质摄入是PAD的关键预测因素。
这是第一项应用机器学习算法研究普通人群营养摄入与PAD关系的研究。维生素C和磷与PAD呈负相关,而饱和脂肪酸、蛋白质和硒呈正相关。