通过预测建模对CatBoost机器学习方法进行微调以早期检测心血管疾病。

Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling.

作者信息

Hamid Muhammad, Hajjej Fahima, Alluhaidan Ala Saleh, Bin Mannie Norah Waleed

机构信息

Department of Computer Science, Government College Women University Sialkot, Sialkot, 51310, Pakistan.

Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia.

出版信息

Sci Rep. 2025 Aug 25;15(1):31199. doi: 10.1038/s41598-025-13790-x.

DOI:10.1038/s41598-025-13790-x

PMID:40854918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12378338/

Abstract

Cardiovascular disease (CVD) remains one of the leading causes of morbidity and mortality worldwide, highlighting the urgent need for early-stage diagnosis to improve clinical outcomes. Machine learning (ML) approaches have demonstrated substantial potential in predictive modeling for CVD risk assessment. In this study, we propose an advanced predictive model based on the CatBoost algorithm to classify various stages of CVD using hospital records as the primary data source. The dataset, sourced from a publicly available repository, comprises 12 key predictor variables. The proposed methodology incorporates feature selection, rigorous validation processes, and data augmentation to enhance predictive performance and address the challenges associated with high-dimensional medical data. Among several ML algorithms evaluated, the fine-tuned CatBoost model achieved the highest performance, automating feature selection and facilitating the detection of early-stage heart disease. The model attained an impressive F1-score of 99% and an overall accuracy of 99.02%, outperforming existing ML-based approaches. These findings underscore the potential of the CatBoost algorithm for rapid and accurate CVD diagnosis, thereby supporting clinical decision-making. Future work will focus on external validation and testing on independent datasets to further assess the model's generalizability and clinical applicability.

摘要

心血管疾病（CVD）仍然是全球发病和死亡的主要原因之一，这凸显了早期诊断以改善临床结果的迫切需求。机器学习（ML）方法在心血管疾病风险评估的预测建模中已显示出巨大潜力。在本研究中，我们提出了一种基于CatBoost算法的先进预测模型，以医院记录作为主要数据源对心血管疾病的各个阶段进行分类。该数据集来自一个公开可用的存储库，包含12个关键预测变量。所提出的方法包括特征选择、严格的验证过程和数据增强，以提高预测性能并应对与高维医学数据相关的挑战。在评估的几种机器学习算法中，经过微调的CatBoost模型表现最佳，它能自动进行特征选择并有助于早期心脏病的检测。该模型获得了令人印象深刻的99%的F1分数和99.02%的总体准确率，优于现有的基于机器学习的方法。这些发现强调了CatBoost算法在快速准确诊断心血管疾病方面的潜力，从而支持临床决策。未来的工作将集中在独立数据集上进行外部验证和测试，以进一步评估该模型的通用性和临床适用性。