Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh.
Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh.
BMC Cardiovasc Disord. 2024 Apr 18;24(1):214. doi: 10.1186/s12872-024-03883-2.
Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs.
The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve.
Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989).
This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient's CVD prognosis.
心血管疾病(CVDs)是全球范围内的主要死因。孟加拉国等中低收入国家(LMICs)也受到多种 CVD 的影响,如心力衰竭和中风。最近,孟加拉国的主要死因已从严重感染和寄生虫病转变为 CVD。
研究数据集由 2022 年 8 月至 2023 年 4 月期间使用简单随机抽样收集的 391 名 CVD 患者的病历组成。此外,还收集了 260 名无 CVD 问题的个体的数据点作为比较。卡方检验和卡方检验用于确定 CVD 与解释变量之间的关联。逻辑回归、朴素贝叶斯分类器、决策树、AdaBoost 分类器、随机森林、Bagging 树和集成学习分类器用于预测 CVD。性能评估包括准确性、敏感性、特异性和接收器操作特征(AU-ROC)曲线下的面积。
随机森林在考虑的五种技术中具有最高的精度。上述分类器的精度率如下:逻辑回归(93.67%)、朴素贝叶斯(94.87%)、决策树(96.1%)、AdaBoost(94.94%)、随机森林(96.15%)和 Bagging 树(94.87%)。随机森林分类器在正确和错误预测之间保持最高的平衡。随机森林分类器以 98.04%的准确率实现了最佳的精度(96.15%)、稳健的召回率(100%)和高 F1 得分(97.7%)。相比之下,逻辑回归模型的准确率最低,为 95.42%。值得注意的是,随机森林分类器的 AUC 值最高(0.989)。
本研究主要集中在确定影响 CVD 患者的关键因素和预测 CVD 风险上。强烈建议在预测心脏病的系统中实施随机森林技术。这项研究可能会通过为医生提供一种新的工具来确定患者的 CVD 预后,从而改变临床实践。