Teja M Darshan, Rayalu G Mokesh
Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, India.
BMC Cardiovasc Disord. 2025 Mar 22;25(1):212. doi: 10.1186/s12872-025-04627-6.
Cardiovascular disease is the leading cause of mortality globally, necessitating precise and prompt predictive instruments to enhance patient outcomes. In recent years, machine learning methodologies have demonstrated significant potential in enhancing the precision and efficacy of health-related predictions, especially in the identification of heart disease. The dataset used in this study came from the UC Irvine Machine Learning Repository and included data from Cleveland, Switzerland, Hungary, Long Beach, and Statlog. We selected seven of the 1,190 cases, each with 12 attributes, for analysis. We used different machine learning models, like Random Forest, K-Nearest Neighbors, Logistic Regression, Naïve Bayes, Gradient Boosting, AdaBoost, XGBoost, and Bagged Trees, to check performance using accuracy, precision, recall, F1-score, and ROC-AUC. K-fold cross-validation (K = 10, K = 5) was conducted to guarantee the robustness and generalizability of these models. Random Forest exhibited remarkable stability, attaining 94% accuracy with K = 10 and 92% with K = 5, whereas XGBoost had a minor decrease during cross-validation (90% for K = 10, 89% for K = 5). KNN demonstrated possible overfitting, evidenced by a notable decline in accuracy (71% for K = 10, 72% for K = 5). XGBoost and Bagged Trees achieved the highest accuracy of 93%, followed by Random Forest and KNN at 91%. Furthermore, Random Forest and Bagged Trees exhibited the highest ROC-AUC values at 95%, and XGBoost demonstrated a ROC-AUC of 94%. The results demonstrate the effectiveness of ensemble methods in predicting cardiac diseases, along with the potential for future advancement through the incorporation of hybrid models and advanced survival analysis techniques.
心血管疾病是全球主要的死亡原因,因此需要精确且及时的预测工具来改善患者预后。近年来,机器学习方法在提高健康相关预测的准确性和有效性方面显示出巨大潜力,尤其是在心脏病识别方面。本研究使用的数据集来自加州大学欧文分校机器学习库,包括来自克利夫兰、瑞士、匈牙利、长滩和Statlog的数据。我们从1190个病例中选择了7个,每个病例有12个属性进行分析。我们使用了不同的机器学习模型,如随机森林、K近邻、逻辑回归、朴素贝叶斯、梯度提升、自适应增强、极端梯度提升和袋装树,通过准确率、精确率、召回率、F1分数和ROC-AUC来检查性能。进行了K折交叉验证(K = 10,K = 5)以确保这些模型的稳健性和泛化性。随机森林表现出显著的稳定性,K = 10时准确率达到94%,K = 5时为92%,而极端梯度提升在交叉验证期间略有下降(K = 10时为90%,K = 5时为89%)。K近邻显示出可能的过拟合,准确率显著下降(K = 10时为71%,K = 5时为72%)证明了这一点。极端梯度提升和袋装树达到了最高准确率93%,其次是随机森林和K近邻,准确率为91%。此外,随机森林和袋装树的ROC-AUC值最高,为95%,极端梯度提升的ROC-AUC为94%。结果表明集成方法在预测心脏病方面的有效性,以及通过结合混合模型和先进的生存分析技术实现未来进展的潜力。