Behera Tapan Kumar, Sathia Siddhartha, Panigrahi Sibarama, Naik Pradeep Kumar
Centre of Excellence in Natural Products and Therapeutics, Department of Biotechnology and Bioinformatics, Sambalpur University, Jyoti Vihar, Burla, Sambalpur, Odisha, India.
Department of Cardiothoracic Surgery (CTVS), All India Institute of Medical Sciences, Sijua, Patrapada, Bhubaneswar, Odisha, India.
J Biopharm Stat. 2024 Nov 24:1-23. doi: 10.1080/10543406.2024.2429524.
Cardiovascular diseases (CVDs) include abnormal conditions of the heart, diseased blood vessels, structural problems of the heart, and blood clots. Traditionally, CVD has been diagnosed by clinical experts, physicians, and medical specialists, which is expensive, time-consuming, and requires expert intervention. On the other hand, cost-effective digital diagnosis of CVD is now possible because of the emergence of machine learning (ML) and statistical techniques.
In this research, extensive studies were carried out to classify CVD via 19 promising ML models. To evaluate the performance and rank the ML models for CVD classification, two benchmark CVD datasets are considered from well-known sources, such as Kaggle and the UCI repository. The results are analysed considering individual datasets and their combination to assess the efficiency and reliability of ML models on the basis of various performance measures, such as precision, kappa, accuracy, recall, and the F1 score. Since some of the ML models are stochastic, we repeated the simulation 50 times for each dataset using each model and applied nonparametric statistical tests to draw decisive conclusions.
The nonparametric Friedman - Nemenyi hypothesis test suggests that the Extra Tree Classifier provides statistically superior accuracy and precision compared with all other models. However, the Extreme Gradient Boost (XGBoost) classifier provides statistically superior recall, kappa, and F1 scores compared with those of all the other models. Additionally, the XGBRF classifier achieves a statistically second-best rank in terms of the recall measures.
心血管疾病(CVDs)包括心脏异常状况、血管病变、心脏结构问题和血栓。传统上,心血管疾病由临床专家、内科医生和医学专家进行诊断,这种方式成本高昂、耗时且需要专家干预。另一方面,由于机器学习(ML)和统计技术的出现,现在有可能实现具有成本效益的心血管疾病数字诊断。
在本研究中,通过19种有前景的机器学习模型对心血管疾病进行分类的广泛研究。为了评估机器学习模型在心血管疾病分类中的性能并进行排名,考虑了来自知名来源(如Kaggle和UCI数据库)的两个基准心血管疾病数据集。根据精度、kappa值、准确率、召回率和F1分数等各种性能指标,对单个数据集及其组合的结果进行分析,以评估机器学习模型的效率和可靠性。由于一些机器学习模型是随机的,我们对每个数据集使用每个模型重复模拟50次,并应用非参数统计检验得出决定性结论。
非参数Friedman - Nemenyi假设检验表明,与所有其他模型相比,Extra Tree Classifier在统计上具有更高的准确率和精度。然而,与所有其他模型相比,极端梯度提升(XGBoost)分类器在统计上具有更高的召回率、kappa值和F1分数。此外,在召回率指标方面,XGBRF分类器在统计上排名第二。