Suppr超能文献

基于SHAP可解释机器学习算法的2型糖尿病合并冠状动脉疾病危险因素探索与分析

Exploration and analysis of risk factors for coronary artery disease with type 2 diabetes based on SHAP explainable machine learning algorithm.

作者信息

Tang Dandan, Liang Fengwei, Gu Xingli, Jin Yuanyuan, Hu Xuanjie, Liu Fen, Yang Yining

机构信息

Postdoctoral Research Station of Clinical Medicine, Xinjiang Medical University, Urumqi, 830017, China.

College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830017, China.

出版信息

Sci Rep. 2025 Aug 12;15(1):29521. doi: 10.1038/s41598-025-11142-3.

Abstract

T2DM is a major risk factor for CHD. In recent years, machine learning algorithms have demonstrated significant advantages in improving predictive accuracy; however, studies applying these methods for clinical prediction and diagnosis of CHD-DM2 remain limited. This study aims to evaluate the performance of machine learning models and to develop an interpretable model to identify critical risk factors of CHD-DM2, thereby supporting clinical decision-making. Data were collected from cardiovascular inpatients admitted to the First Affiliated Hospital of Xinjiang Medical University between 2001 and 2018. A total of 12,400 patients were included, comprising 10,257 cases of CHD and 2143 cases of CHD-DM2.To address the class imbalance in the dataset, the SMOTENC algorithm was applied in conjunction with the themis package for data preprocessing. Final predictors were identified through a combined approach of univariate analysis and Lasso regression. We then developed and validated seven machine learning models: Logistic, Logistic_Lasso, KNN, SVM, XGBoost, RF, and LightGBM. The predictive performance of the five models was compared using evaluation metrics including accuracy, sensitivity, specificity, AUC, ROC and DCA. Additionally, SHAP values were employed to provide interpretability of the model outputs. The dataset was split into a training set (n = 8460) and a validation set (n = 3680) at a 7:3 ratio. A total of 25 predictive variables were ultimately identified through Lasso regression analysis. Among the seven machine learning models, the RF model demonstrated significantly superior performance and achieved the highest net benefit in the DCA. According to SHAP analysis, Diabetes.History, BG, and HbA1c were identified as the top contributors to CHD-DM2 risk. This study identified Diabetes.History, blood glucose (BG), and HbA1c as the primary risk factors for CHD-DM2. It is recommended that hospitals enhance monitoring of such patients, document the presence of high-risk factors, and implement targeted intervention strategies accordingly.

摘要

2型糖尿病(T2DM)是冠心病(CHD)的主要危险因素。近年来,机器学习算法在提高预测准确性方面显示出显著优势;然而,将这些方法应用于CHD-DM2临床预测和诊断的研究仍然有限。本研究旨在评估机器学习模型的性能,并开发一个可解释的模型来识别CHD-DM2的关键危险因素,从而支持临床决策。数据收集自2001年至2018年在新疆医科大学第一附属医院住院的心血管疾病患者。共纳入12400例患者,其中冠心病患者10257例,CHD-DM2患者2143例。为了解决数据集中的类别不平衡问题,将SMOTENC算法与themis软件包结合应用于数据预处理。通过单变量分析和Lasso回归的联合方法确定最终预测因子。然后,我们开发并验证了七种机器学习模型:逻辑回归(Logistic)、逻辑回归-Lasso(Logistic_Lasso)、K近邻(KNN)、支持向量机(SVM)、极端梯度提升(XGBoost)、随机森林(RF)和轻量级梯度提升机(LightGBM)。使用包括准确率、灵敏度、特异性、曲线下面积(AUC)、受试者工作特征曲线(ROC)和决策曲线分析(DCA)在内的评估指标比较了五个模型的预测性能。此外,采用SHAP值来解释模型输出结果。数据集按7:3的比例分为训练集(n = 8460)和验证集(n = 3680)。通过Lasso回归分析最终共确定了25个预测变量。在七种机器学习模型中,RF模型表现出显著优越的性能,并且在DCA中实现了最高的净效益。根据SHAP分析,糖尿病病史、血糖(BG)和糖化血红蛋白(HbA1c)被确定为CHD-DM2风险的主要贡献因素。本研究确定糖尿病病史、血糖(BG)和糖化血红蛋白(HbA1c)为CHD-DM2的主要危险因素。建议医院加强对此类患者的监测,记录高危因素的存在情况,并相应实施针对性干预策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da59/12344076/d2bc6347227a/41598_2025_11142_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验