Ji Yijian, Shang Hongyan, Yi Jing, Zang Wenhui, Cao Wenjun
Academy of Public Health, Shanxi Medical University, Jinzhong, Shanxi, China.
Academy of Medical Sciences, Shanxi Medical University, Jinzhong, Shanxi, China.
Acta Diabetol. 2025 Apr 1. doi: 10.1007/s00592-025-02496-1.
Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.
This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.
This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.
2型糖尿病和冠心病在中国人群中的患病率不断上升,是主要的死亡原因。糖尿病和冠心病并存,由于其诊断具有挑战性且预后较差,带来了重大的疾病负担。近年来,机器学习在医学领域的诊断应用中频繁使用;然而,预测2型糖尿病合并冠心病的模型面临着预测性能较低以及预测过程中受到其他合并症干扰等问题。
本研究提高了预测糖尿病和冠心病并存的模型的预测准确性、敏感性、特异性、F1分数和AUC。我们使用XGBoost结合SHAP进行特征分析,开发了一种先进的预测模型。通过比较特征选择、超参数优化和计算效率分析,我们确定了模型性能的最佳条件。使用独立数据集进行外部验证,证实了该模型的稳健性和通用性,支持其在临床实践中的潜在应用。
本研究比较了三种模型——随机森林、LightGBM和XGBoost——发现XGBoost在有效性和计算效率方面均表现出卓越性能。XGBoost模型的准确率(Acc)为0.8910,经过超参数调整后提高到0.8942。使用中国山西省平阳医院和河津医院的数据集进行外部验证,得出AUC为0.7897,证明了强大的通用性。通过整合SHAP(SHapley Additive exPlanations)以实现可解释性,我们的研究确定胆红素水平、嗜碱性粒细胞计数、胆固醇水平和年龄是预测2型糖尿病(T2DM)和冠心病(CHD)并存的关键特征。这些发现与XGBoost算法确定的特征重要性排名完全一致。该模型具有适度的预测性能(外部验证中AUC = 0.7879)和实际可解释性,在资源有限的环境中提高T2DM-CHD合并症的诊断效率方面具有潜在用途。然而,其临床应用需要在不同人群中进一步验证。