Praveen S Phani, Hasan Mohammad Kamrul, Abdullah Siti Norul Huda Sheikh, Sirisha Uddagiri, Tirumanadham N S Koti Mani Kumar, Islam Shayla, Ahmed Fatima Rayan Awad, Ahmed Thowiba E, Noboni Ayman Afrin, Sampedro Gabriel Avelino, Yeun Chan Yeob, Ghazal Taher M
Department of Computer Science and Engineering, Prasad V Potluri Siddhartha Institute of Technology, Vijayawada, India.
Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor, Malaysia.
Front Med (Lausanne). 2024 Jul 5;11:1407376. doi: 10.3389/fmed.2024.1407376. eCollection 2024.
Global Cardiovascular disease (CVD) is still one of the leading causes of death and requires the enhancement of diagnostic methods for the effective detection of early signs and prediction of the disease outcomes. The current diagnostic tools are cumbersome and imprecise especially with complex diseases, thus emphasizing the incorporation of new machine learning applications in differential diagnosis.
This paper presents a new machine learning approach that uses MICE for mitigating missing data, the IQR for handling outliers and SMOTE to address first imbalance distance. Additionally, to select optimal features, we introduce the Hybrid 2-Tier Grasshopper Optimization with L2 regularization methodology which we call GOL2-2T. One of the promising methods to improve the predictive modelling is an Adaboost decision fusion (ABDF) ensemble learning algorithm with babysitting technique implemented for the hyperparameters tuning. The accuracy, recall, and AUC score will be considered as the measures for assessing the model.
On the results, our heart disease prediction model yielded an accuracy of 83.0%, and a balanced F1 score of 84.0%. The integration of SMOTE, IQR outlier detection, MICE, and GOL2-2T feature selection enhances robustness while improving the predictive performance. ABDF removed the impurities in the model and elaborated its effectiveness, which proved to be high on predicting the heart disease.
These findings demonstrate the effectiveness of additional machine learning methodologies in medical diagnostics, including early recognition improvements and trustworthy tools for clinicians. But yes, the model's use and extent of work depends on the dataset used for it really. Further work is needed to replicate the model across different datasets and samples: as for most models, it will be important to see if the results are generalizable to populations that are not representative of the patient population that was used for the current study.
全球心血管疾病(CVD)仍然是主要死因之一,需要改进诊断方法以有效检测早期症状并预测疾病结果。当前的诊断工具繁琐且不精确,尤其是对于复杂疾病,因此强调在鉴别诊断中纳入新的机器学习应用。
本文提出了一种新的机器学习方法,该方法使用多重填补法(MICE)来缓解缺失数据,使用四分位距(IQR)来处理异常值,并使用合成少数过采样技术(SMOTE)来解决初始不平衡距离问题。此外,为了选择最佳特征,我们引入了带有L2正则化方法的混合双层蚱蜢优化算法,我们称之为GOL2-2T。改进预测建模的一种有前景的方法是采用带保姆技术的Adaboost决策融合(ABDF)集成学习算法来调整超参数。准确性、召回率和AUC分数将被视为评估模型的指标。
结果显示,我们的心脏病预测模型准确率为83.0%,平衡F1分数为84.0%。SMOTE、IQR异常值检测、MICE和GOL2-2T特征选择的整合提高了稳健性,同时提升了预测性能。ABDF去除了模型中的杂质并阐述了其有效性,事实证明该方法在预测心脏病方面表现出色。
这些发现证明了额外的机器学习方法在医学诊断中的有效性,包括改善早期识别以及为临床医生提供可靠工具。但是,模型的使用和工作范围实际上取决于所使用的数据集。需要进一步开展工作以在不同数据集和样本上复制该模型:对于大多数模型而言,重要的是要看结果是否能够推广到不代表当前研究所用患者群体的人群。