Alqaissi Eman, Alotaibi Fahd, Ramzan Muhammad Sher
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
Information Systems, King Khalid University, Abha, Saudi Arabia.
PeerJ Comput Sci. 2023 Apr 10;9:e1333. doi: 10.7717/peerj-cs.1333. eCollection 2023.
COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms.
The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted.
We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy.
The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations.
COVID-19是一种由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起的传染病。COVID-19的症状从轻度到中度呼吸道疾病不等,有时需要紧急药物治疗。因此,通过特定的临床检测、检测试剂盒和医疗设备在早期阶段检测出COVID-19至关重要。然而,在疫情期间这些检测并非总能获得。因此,本研究基于COVID-19的症状开发了一种自动、智能、快速且实时的诊断模型,用于早期检测COVID-19。
导入基于异构数据文献构建的COVID-19知识图谱(KG),以了解COVID-19的不同关系。我们将人类疾病本体添加到COVID-19知识图谱中,并应用一种名为快速随机投影的节点嵌入图算法从COVID-19数据集中提取额外特征。随后,使用两个机器学习(ML)管道进行实验,以根据症状预测COVID-19感染情况。此外,还采用了模型超参数的自动调整。
我们比较了两种基于图的ML模型,即逻辑回归(LR)模型和随机森林(RF)模型。所提出的基于图的RF模型实现了较小的错误率 = 0.0064,并且在所有性能指标上都取得了最佳分数,包括特异性 = 98.71%、准确率 = 99.36%、精确率 = 99.65%、召回率 = 99.53%以及F1分数 = 99.59%。此外,RF模型获得的马修斯相关系数高于LR模型。与其他ML算法以及文献研究的对比分析表明,所提出的RF模型表现出最佳的检测准确率。
基于图的RF模型在对COVID-19感染症状进行分类时表现出高性能,从而表明图数据科学与ML技术相结合有助于提高性能并加速创新。