Fan Yongxian, Liu Meng, Sun Guicong
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
PLoS One. 2023 Sep 21;18(9):e0291961. doi: 10.1371/journal.pone.0291961. eCollection 2023.
Coronaviruses have affected the lives of people around the world. Increasingly, studies have indicated that the virus is mutating and becoming more contagious. Hence, the pressing priority is to swiftly and accurately predict patient outcomes. In addition, physicians and patients increasingly need interpretability when building machine models in healthcare. We propose an interpretable machine framework(KISM) that can diagnose and prognose patients based on blood test datasets. First, we use k-nearest neighbors, isolated forests, and SMOTE to pre-process the original blood test datasets. Seven machine learning tools Support Vector Machine, Extra Tree, Random Forest, Gradient Boosting Decision Tree, eXtreme Gradient Boosting, Logistic Regression, and ensemble learning were then used to diagnose and predict COVID-19. In addition, we used SHAP and scikit-learn post-hoc interpretability to report feature importance, allowing healthcare professionals and artificial intelligence models to interact to suggest biomarkers that some doctors may have missed. The 10-fold cross-validation of two public datasets shows that the performance of KISM is better than that of the current state-of-the-art methods. In the diagnostic COVID-19 task, an AUC value of 0.9869 and an accuracy of 0.9787 were obtained, and ultimately Leukocytes, platelets, and Proteina C reativa mg/dL were found to be the most indicative biomarkers for the diagnosis of COVID-19. An AUC value of 0.9949 and an accuracy of 0.9677 were obtained in the prognostic COVID-19 task and Age, LYMPH, and WBC were found to be the most indicative biomarkers for identifying the severity of the patient.
冠状病毒已经影响了世界各地人们的生活。越来越多的研究表明,这种病毒正在变异,传染性越来越强。因此,当务之急是迅速准确地预测患者的预后。此外,医生和患者在医疗保健领域构建机器模型时越来越需要可解释性。我们提出了一种可解释的机器框架(KISM),它可以根据血液检测数据集对患者进行诊断和预后评估。首先,我们使用k近邻、孤立森林和SMOTE对原始血液检测数据集进行预处理。然后使用七种机器学习工具——支持向量机、极端随机树、随机森林、梯度提升决策树、XGBoost、逻辑回归和集成学习来诊断和预测新冠肺炎。此外,我们使用SHAP和scikit-learn事后可解释性来报告特征重要性,使医疗专业人员和人工智能模型能够相互作用,以提出一些医生可能遗漏的生物标志物。对两个公共数据集进行的10折交叉验证表明,KISM的性能优于当前最先进的方法。在新冠肺炎诊断任务中,AUC值为0.9869,准确率为0.9787,最终发现白细胞、血小板和C反应蛋白mg/dL是诊断新冠肺炎最具指示性的生物标志物。在新冠肺炎预后任务中,AUC值为0.9949,准确率为0.9677,发现年龄、淋巴细胞和白细胞是识别患者严重程度最具指示性的生物标志物。