Villavicencio Charlyn Nayve, Macrohon Julio Jerison, Inbaraj Xavier Alphonse, Jeng Jyh-Horng, Hsieh Jer-Guang
Department of Information Engineering, I-Shou University, Kaohsiung City 84001, Taiwan.
College of Information and Communications Technology, Bulacan State University, Malolos City 3000, Philippines.
Diagnostics (Basel). 2022 Mar 27;12(4):821. doi: 10.3390/diagnostics12040821.
Detecting the presence of a disease requires laboratory tests, testing kits, and devices; however, these were not always available on hand. This study proposes a new approach in disease detection using machine learning algorithms by analyzing symptoms experienced by a person without requiring laboratory tests. Six supervised machine learning algorithms such as J48 decision tree, random forest, support vector machine, k-nearest neighbors, naïve Bayes algorithms, and artificial neural networks were applied in the "COVID-19 Symptoms and Presence Dataset" from Kaggle. Through hyperparameter optimization and 10-fold cross validation, we attained the highest possible performance of each algorithm. A comparative analysis was performed according to accuracy, sensitivity, specificity, and area under the ROC curve. Results show that random forest, support vector machine, k-nearest neighbors, and artificial neural networks outweighed other algorithms by attaining 98.84% accuracy, 100% sensitivity, 98.79% specificity, and 98.84% area under the ROC curve. Finally, we developed a web application that will allow users to select symptoms currently being experienced, and use it to predict the presence of COVID-19 through the developed prediction model. Based on this mechanism, the proposed method can effectively predict the presence or absence of COVID-19 in a person immediately without using laboratory tests, kits, and devices in a real-time manner.
疾病检测需要实验室检测、检测试剂盒和设备;然而,这些并非总是随手可得。本研究提出了一种利用机器学习算法进行疾病检测的新方法,通过分析个人经历的症状,无需实验室检测。六种监督式机器学习算法,如J48决策树、随机森林、支持向量机、k近邻、朴素贝叶斯算法和人工神经网络,被应用于来自Kaggle的“COVID-19症状与患病数据集”。通过超参数优化和10折交叉验证,我们获得了每种算法的最高性能。根据准确率、灵敏度、特异性和ROC曲线下面积进行了对比分析。结果表明,随机森林、支持向量机、k近邻和人工神经网络的表现优于其他算法,其准确率达到98.84%,灵敏度达到100%,特异性达到98.79%,ROC曲线下面积达到98.84%。最后,我们开发了一个网络应用程序,允许用户选择当前正在经历的症状,并通过开发的预测模型来预测COVID-19的患病情况。基于此机制,所提出的方法可以在不使用实验室检测、试剂盒和设备的情况下,即时有效地预测一个人是否感染COVID-19。