Department of Computer Engineering, College of Technology, University of Buea, Buea, Cameroon.
Department of Training, Research Development and Innovation, InchTech's Solutions, Yaoundé, Cameroon.
J Healthc Eng. 2021 Nov 22;2021:4733167. doi: 10.1155/2021/4733167. eCollection 2021.
Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance.
SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained.
The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.
我们的分析和机器学习算法基于文献中最常引用的两个临床数据集:一个来自意大利米兰的圣拉斐尔医院,另一个来自巴西圣保罗的以色列爱因斯坦医院。对这些数据集进行了处理,以选择对目标影响最大的最佳特征,结果发现几乎所有特征都是血液参数。应用探索性数据分析(EDA)方法对数据集进行了分析,并对监督机器学习模型进行了比较研究,之后选择支持向量机(SVM)作为性能最佳的模型。
SVM 是表现最佳的模型,被用作我们提出的监督机器学习算法。在对 Kaggle 数据集(https://www.kaggle.com/einsteindata4u/covid19)应用优化后,获得了 99.29%的准确率、92.79%的灵敏度和 100%的特异性。对来自圣拉斐尔医院的数据集(https://zenodo.org/record/3886927#.YIluB5AzbMV)也进行了相同的处理和工作。同样,SVM 在其他机器学习算法中表现最佳,分别获得了 92.86%、93.55%和 90.91%的准确率、灵敏度和特异性。
与基于这些相同数据集的文献中的其他结果相比,我们的方法得到的结果更为出色,这使我们得出结论,我们提出的解决方案可靠适用于 COVID-19 的诊断。