Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Mol Med. 2022 Aug 3;28(1):86. doi: 10.1186/s10020-022-00513-5.
Regardless of improvements in controlling the COVID-19 pandemic, the lack of comprehensive insight into SARS-COV-2 pathogenesis is still a sophisticated challenge. In order to deal with this challenge, we utilized advanced bioinformatics and machine learning algorithms to reveal more characteristics of SARS-COV-2 pathogenesis and introduce novel host response-based diagnostic biomarker panels.
In the present study, eight published RNA-Seq datasets related to whole-blood (WB) and nasopharyngeal (NP) swab samples of patients with COVID-19, other viral and non-viral acute respiratory illnesses (ARIs), and healthy controls (HCs) were integrated. To define COVID-19 meta-signatures, Gene Ontology and pathway enrichment analyses were applied to compare COVID-19 with other similar diseases. Additionally, CIBERSORTx was executed in WB samples to detect the immune cell landscape. Furthermore, the optimum WB- and NP-based diagnostic biomarkers were identified via all the combinations of 3 to 9 selected features and the 2-phases machine learning (ML) method which implemented k-fold cross validation and independent test set validation.
The host gene meta-signatures obtained for SARS-COV-2 infection were different in the WB and NP samples. The gene ontology and enrichment results of the WB dataset represented the enhancement in inflammatory host response, cell cycle, and interferon signature in COVID-19 patients. Furthermore, NP samples of COVID-19 in comparison with HC and non-viral ARIs showed the significant upregulation of genes associated with cytokine production and defense response to the virus. In contrast, these pathways in COVID-19 compared to other viral ARIs were strikingly attenuated. Notably, immune cell proportions of WB samples altered in COVID-19 versus HC. Moreover, the optimum WB- and NP-based diagnostic panels after two phases of ML-based validation included 6 and 8 markers with an accuracy of 97% and 88%, respectively.
Based on the distinct gene expression profiles of WB and NP, our results indicated that SARS-COV-2 function is body-site-specific, although according to the common signature in WB and NP COVID-19 samples versus controls, this virus also induces a global and systematic host response to some extent. We also introduced and validated WB- and NP-based diagnostic biomarkers using ML methods which can be applied as a complementary tool to diagnose the COVID-19 infection from non-COVID cases.
尽管 COVID-19 疫情得到了控制,但人们对 SARS-COV-2 发病机制的认识仍然不够全面,这仍是一个复杂的挑战。为了应对这一挑战,我们利用先进的生物信息学和机器学习算法来揭示 SARS-COV-2 发病机制的更多特征,并引入新的基于宿主反应的诊断生物标志物组合。
本研究整合了 8 个已发表的与 COVID-19、其他病毒和非病毒急性呼吸道疾病(ARIs)以及健康对照(HC)患者的全血(WB)和鼻咽(NP)拭子样本相关的 RNA-Seq 数据集。为了定义 COVID-19 元特征,我们应用基因本体论和途径富集分析来比较 COVID-19 与其他类似疾病。此外,在 WB 样本中执行了 CIBERSORTx 以检测免疫细胞景观。此外,通过所有 3 到 9 个选定特征的组合以及实施 k 折交叉验证和独立测试集验证的两阶段机器学习(ML)方法,确定了最佳的 WB 和 NP 诊断生物标志物。
在 WB 和 NP 样本中,我们获得的 SARS-COV-2 感染宿主基因元特征不同。WB 数据集的基因本体论和富集结果代表了 COVID-19 患者中炎症宿主反应、细胞周期和干扰素特征的增强。此外,与 HC 和非病毒 ARIs 相比,COVID-19 的 NP 样本显示与细胞因子产生和对病毒的防御反应相关的基因显著上调。相比之下,与其他病毒 ARIs 相比,这些途径在 COVID-19 中明显减弱。值得注意的是,与 HC 相比,WB 样本中的免疫细胞比例在 COVID-19 中发生了改变。此外,经过两轮 ML 验证的最佳 WB 和 NP 诊断面板包括 6 个和 8 个标记物,准确性分别为 97%和 88%。
基于 WB 和 NP 的不同基因表达谱,我们的结果表明 SARS-COV-2 的功能具有特定的身体部位特异性,尽管根据 WB 和 NP COVID-19 样本与对照的共同特征,该病毒也在一定程度上诱导了全身性和系统性宿主反应。我们还使用 ML 方法引入并验证了 WB 和 NP 诊断生物标志物,该方法可以作为从非 COVID 病例中诊断 COVID-19 感染的辅助工具。