Kaur Balraj Preet, Singh Harpreet, Hans Rahul, Sharma Sanjeev Kumar, Sharma Chetna, Hassan Md Mehedi
Department of Computer Science and Engineering, DAV University, Jalandhar, Punjab, India.
Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India.
PLoS One. 2024 Dec 2;19(12):e0308015. doi: 10.1371/journal.pone.0308015. eCollection 2024.
In the current era, a lot of research is being done in the domain of disease diagnosis using machine learning. In recent times, one of the deadliest respiratory diseases, COVID-19, which causes serious damage to the lungs has claimed a lot of lives globally. Machine learning-based systems can assist clinicians in the early diagnosis of the disease, which can reduce the deadly effects of the disease. For the successful deployment of these machine learning-based systems, hyperparameter-based optimization and feature selection are important issues. Motivated by the above, in this proposal, we design an improved model to predict the existence of respiratory disease among patients by incorporating hyperparameter optimization and feature selection. To optimize the parameters of the machine learning algorithms, hyperparameter optimization with a genetic algorithm is proposed and to reduce the size of the feature set, feature selection is performed using binary grey wolf optimization algorithm. Moreover, to enhance the efficacy of the predictions made by hyperparameter-optimized machine learning models, an ensemble model is proposed using a stacking classifier. Also, explainable AI was incorporated to define the feature importance by making use of Shapely adaptive explanations (SHAP) values. For the experimentation, the publicly accessible Mexico clinical dataset of COVID-19 was used. The results obtained show that the proposed model has superior prediction accuracy in comparison to its counterparts. Moreover, among all the hyperparameter-optimized algorithms, adaboost algorithm outperformed all the other hyperparameter-optimized algorithms. The various performance assessment metrics, including accuracy, precision, recall, AUC, and F1-score, were used to assess the results.
在当前时代,利用机器学习进行疾病诊断领域的研究工作众多。近期,最致命的呼吸系统疾病之一——对肺部造成严重损害的新冠病毒病(COVID-19)已在全球夺走了许多生命。基于机器学习的系统可以协助临床医生对该疾病进行早期诊断,从而降低疾病的致命影响。对于这些基于机器学习的系统的成功部署,基于超参数的优化和特征选择是重要问题。受上述因素推动,在本提案中,我们设计了一种改进模型,通过结合超参数优化和特征选择来预测患者是否存在呼吸系统疾病。为了优化机器学习算法的参数,提出了使用遗传算法进行超参数优化,并且为了减小特征集的大小,使用二进制灰狼优化算法进行特征选择。此外,为了提高经超参数优化的机器学习模型所做预测的效力,提出了一种使用堆叠分类器的集成模型。同时,纳入了可解释人工智能,通过利用Shapely自适应解释(SHAP)值来定义特征重要性。为了进行实验,使用了公开可用的墨西哥新冠病毒病临床数据集。所获得的结果表明,与同类模型相比,所提出的模型具有更高的预测准确率。此外,在所有经超参数优化的算法中,adaboost算法的表现优于所有其他经超参数优化的算法。使用了包括准确率、精确率、召回率、AUC和F1分数在内的各种性能评估指标来评估结果。