Department of Electrical and Computer Engineering, Institute of Science, Altinbas University, Istanbul 34218, Turkey.
Medicina (Kaunas). 2022 Nov 28;58(12):1745. doi: 10.3390/medicina58121745.
Background and Objectives: Recently, many studies have focused on the early diagnosis of coronary artery disease (CAD), which is one of the leading causes of cardiac-associated death worldwide. The effectiveness of the most important features influencing disease diagnosis determines the performance of machine learning systems that can allow for timely and accurate treatment. We performed a Hybrid ML framework based on hard ensemble voting optimization (HEVO) to classify patients with CAD using the Z-Alizadeh Sani dataset. All categorical features were converted to numerical forms, the synthetic minority oversampling technique (SMOTE) was employed to overcome imbalanced distribution between two classes in the dataset, and then, recursive feature elimination (RFE) with random forest (RF) was used to obtain the best subset of features. Materials and Methods: After solving the biased distribution in the CAD data set using the SMOTE method and finding the high correlation features that affected the classification of CAD patients. The performance of the proposed model was evaluated using grid search optimization, and the best hyperparameters were identified for developing four applications, namely, RF, AdaBoost, gradient-boosting, and extra trees based on an HEV classifier. Results: Five fold cross-validation experiments with the HEV classifier showed excellent prediction performance results with the 10 best balanced features obtained using SMOTE and feature selection. All evaluation metrics results reached > 98% with the HEV classifier, and the gradient-boosting model was the second best classification model with accuracy = 97% and F1-score = 98%. Conclusions: When compared to modern methods, the proposed method perform well in diagnosing coronary artery disease, and therefore, the proposed method can be used by medical personnel for supplementary therapy for timely, accurate, and efficient identification of CAD cases in suspected patients.
最近,许多研究都集中在冠状动脉疾病(CAD)的早期诊断上,CAD 是全球导致心脏相关死亡的主要原因之一。影响疾病诊断的最重要特征的有效性决定了机器学习系统的性能,而这些系统可以实现及时、准确的治疗。我们使用 Z-Alizadeh Sani 数据集,通过硬集成投票优化(HEVO)来执行基于混合机器学习的框架,以对 CAD 患者进行分类。将所有类别特征转换为数值形式,采用合成少数类过采样技术(SMOTE)克服数据集两类别之间的不平衡分布,然后采用随机森林(RF)的递归特征消除(RFE)获取特征的最佳子集。材料与方法:通过 SMOTE 方法解决 CAD 数据集的偏置分布问题,并找到影响 CAD 患者分类的高相关特征。采用网格搜索优化对所提出模型的性能进行评估,并确定最佳超参数,用于开发四种基于 HEV 分类器的应用程序,即 RF、AdaBoost、梯度提升和极端随机树。结果:使用 HEV 分类器进行的五重交叉验证实验,结果显示,在使用 SMOTE 和特征选择获取的 10 个最佳平衡特征的情况下,该模型具有出色的预测性能。所有评估指标结果均超过 98%,HEV 分类器的结果最好,梯度提升模型是第二种最佳分类模型,准确率=97%,F1 得分为 98%。结论:与现代方法相比,所提出的方法在诊断冠状动脉疾病方面表现良好,因此,医疗人员可以使用该方法进行补充治疗,以便及时、准确、有效地识别疑似患者中的 CAD 病例。