Marateb Hamid Reza, Ziaie Nezhad Farzad, Mohebian Mohammad Reza, Sami Ramin, Haghjooy Javanmard Shaghayegh, Dehghan Niri Fatemeh, Akafzadeh-Savari Mahsa, Mansourian Marjan, Mañanas Miquel Angel, Wolkewitz Martin, Binder Harald
The Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran.
Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK, Canada.
Front Med (Lausanne). 2021 Nov 18;8:768467. doi: 10.3389/fmed.2021.768467. eCollection 2021.
Coronavirus disease-2019, also known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was a disaster in 2020. Accurate and early diagnosis of coronavirus disease-2019 (COVID-19) is still essential for health policymaking. Reverse transcriptase-polymerase chain reaction (RT-PCR) has been performed as the operational gold standard for COVID-19 diagnosis. We aimed to design and implement a reliable COVID-19 diagnosis method to provide the risk of infection using demographics, symptoms and signs, blood markers, and family history of diseases to have excellent agreement with the results obtained by the RT-PCR and CT-scan. Our study primarily used sample data from a 1-year hospital-based prospective COVID-19 open-cohort, the Khorshid COVID Cohort (KCC) study. A sample of 634 patients with COVID-19 and 118 patients with pneumonia with similar characteristics whose RT-PCR and chest CT scan were negative (as the control group) (dataset 1) was used to design the system and for internal validation. Two other online datasets, namely, some symptoms (dataset 2) and blood tests (dataset 3), were also analyzed. A combination of one-hot encoding, stability feature selection, over-sampling, and an ensemble classifier was used. Ten-fold stratified cross-validation was performed. In addition to gender and symptom duration, signs and symptoms, blood biomarkers, and comorbidities were selected. Performance indices of the cross-validated confusion matrix for dataset 1 were as follows: sensitivity of 96% [confidence interval, CI, 95%: 94-98], specificity of 95% [90-99], positive predictive value (PPV) of 99% [98-100], negative predictive value (NPV) of 82% [76-89], diagnostic odds ratio (DOR) of 496 [198-1,245], area under the ROC (AUC) of 0.96 [0.94-0.97], Matthews Correlation Coefficient (MCC) of 0.87 [0.85-0.88], accuracy of 96% [94-98], and Cohen's Kappa of 0.86 [0.81-0.91]. The proposed algorithm showed excellent diagnosis accuracy and class-labeling agreement, and fair discriminant power. The AUC on the datasets 2 and 3 was 0.97 [0.96-0.98] and 0.92 [0.91-0.94], respectively. The most important feature was white blood cell count, shortness of breath, and C-reactive protein for datasets 1, 2, and 3, respectively. The proposed algorithm is, thus, a promising COVID-19 diagnosis method, which could be an amendment to simple blood tests and screening of symptoms. However, the RT-PCR and chest CT-scan, performed as the gold standard, are not 100% accurate.
2019冠状病毒病,也称为严重急性呼吸综合征冠状病毒2(SARS-CoV-2),是2020年的一场灾难。对2019冠状病毒病(COVID-19)进行准确、早期诊断对于卫生政策制定仍然至关重要。逆转录聚合酶链反应(RT-PCR)已作为COVID-19诊断的现行金标准。我们旨在设计并实施一种可靠的COVID-19诊断方法,利用人口统计学、症状体征、血液标志物和疾病家族史来提供感染风险,使其与RT-PCR和CT扫描结果具有高度一致性。我们的研究主要使用了来自一项为期1年的基于医院的前瞻性COVID-19开放队列——霍尔希德COVID队列(KCC)研究的样本数据。选取了634例COVID-19患者以及118例具有相似特征且RT-PCR和胸部CT扫描均为阴性的肺炎患者(作为对照组)(数据集1)来设计该系统并进行内部验证。另外还分析了两个在线数据集,即一些症状(数据集2)和血液检测(数据集3)。采用了独热编码、稳定性特征选择、过采样和集成分类器相结合的方法。进行了十折分层交叉验证。除了性别和症状持续时间外,还选取了体征和症状、血液生物标志物以及合并症。数据集1交叉验证混淆矩阵的性能指标如下:灵敏度为96%[置信区间,CI,95%:94 - 98],特异性为95%[90 - 99],阳性预测值(PPV)为99%[98 - 100],阴性预测值(NPV)为82%[76 - 89],诊断比值比(DOR)为496[198 - 1, 245],ROC曲线下面积(AUC)为0.96[0.94 - 0.97],马修斯相关系数(MCC)为0.87[0.85 - 0.88],准确率为96%[94 - 98],科恩卡帕系数为0.86[0.81 - 0.91]。所提出的算法显示出优异的诊断准确性和类别标签一致性,以及良好的判别能力。数据集2和3上的AUC分别为0.97[0.96 - 0.98]和0.92[0.91 - 0.94]。对于数据集1、2和3,最重要的特征分别是白细胞计数、呼吸急促和C反应蛋白。因此,所提出的算法是一种很有前景的COVID-19诊断方法,可作为简单血液检测和症状筛查的补充。然而,作为金标准的RT-PCR和胸部CT扫描并非100%准确。