Deng Lang, Lu Kongjie, Hu Huanhuan
Huzhou Central Hospital, Fifth School of Clinical Medicine of Zhejiang Chinese Medical University, Huzhou, China.
Huzhou Central Hospital, Affiliated Central Hospital of Huzhou University, Huzhou, China.
PLoS One. 2025 Sep 12;20(9):e0330377. doi: 10.1371/journal.pone.0330377. eCollection 2025.
Coronary Heart Disease (CHD) is one of the major burdens of cardiovascular diseases worldwide. Traditional diagnostic methods, such as coronary angiography and electrocardiogram, face challenges including high costs, subjectivity, and high misdiagnosis rates. To address these issues, this study proposes a prediction framework for CHD based on the LightGBM algorithm, aiming to improve the accuracy and interpretability of CHD risk prediction.
This study utilized three publicly available datasets: BRFSS_2015, Framingham, and Z-Alizadeh Sani. The BRFSS_2015 dataset was used for model training, while the Framingham and Z-Alizadeh Sani datasets were employed for validation. Data preprocessing included cleaning, feature engineering, and handling missing values. The LightGBM model was selected for its efficiency and performance, and SHAP (SHapley Additive exPlanations) values were used to enhance model interpretability. Model performance was evaluated using metrics such as accuracy, precision, recall, F1-score, and AUROC. A CHD scoring system was developed based on the model's predictions to assist clinicians in risk assessment.
The LightGBM model demonstrated excellent performance, achieving an accuracy of 90.60% and an AUROC of 81.06% on the BRFSS_2015 dataset. After parameter tuning, the model's accuracy improved to 90.61%, and the AUROC increased to 81.11%. On the Framingham dataset, the accuracy improved from 83.96% to 85.26%, and the AUROC increased from 62.86% to 67.37%. On the Z-Alizadeh Sani dataset, the accuracy improved from 78.69% to 80.33%, and the precision increased from 74.40% to 76.36%.
SHAP analysis revealed that age, smoking status, diabetes, hypertension, and high cholesterol were the most influential features in predicting CHD risk. The developed CHD scoring system provided a user-friendly tool for clinicians to assess patient risk levels effectively.
冠心病(CHD)是全球心血管疾病的主要负担之一。传统的诊断方法,如冠状动脉造影和心电图,面临着成本高、主观性强和误诊率高等挑战。为了解决这些问题,本研究提出了一种基于LightGBM算法的冠心病预测框架,旨在提高冠心病风险预测的准确性和可解释性。
本研究使用了三个公开可用的数据集:BRFSS_2015、弗雷明汉心脏研究数据集和Z-Alizadeh Sani数据集。BRFSS_2015数据集用于模型训练,而弗雷明汉心脏研究数据集和Z-Alizadeh Sani数据集用于验证。数据预处理包括清洗、特征工程和处理缺失值。选择LightGBM模型是因为其效率和性能,并使用SHAP(SHapley值加法解释)值来增强模型的可解释性。使用准确率、精确率、召回率、F1分数和AUROC等指标评估模型性能。基于模型预测开发了冠心病评分系统,以协助临床医生进行风险评估。
LightGBM模型表现出色,在BRFSS_2015数据集上的准确率达到90.60%,AUROC为81.06%。经过参数调整后,模型的准确率提高到90.61%,AUROC增加到81.11%。在弗雷明汉心脏研究数据集上,准确率从83.96%提高到85.26%,AUROC从62.86%增加到67.37%。在Z-Alizadeh Sani数据集上,准确率从78.69%提高到80.33%,精确率从74.40%提高到76.36%。
SHAP分析表明,年龄、吸烟状况、糖尿病、高血压和高胆固醇是预测冠心病风险最具影响力的特征。开发的冠心病评分系统为临床医生提供了一个用户友好的工具,以有效地评估患者的风险水平。