Zhang Yu-Hang, Li Hao, Zeng Tao, Chen Lei, Li Zhandong, Huang Tao, Cai Yu-Dong
School of Life Sciences, Shanghai University, Shanghai, China.
Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States.
Front Cell Dev Biol. 2021 Jan 11;8:627302. doi: 10.3389/fcell.2020.627302. eCollection 2020.
The world-wide Coronavirus Disease 2019 (COVID-19) pandemic was triggered by the widespread of a new strain of coronavirus named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Multiple studies on the pathogenesis of SARS-CoV-2 have been conducted immediately after the spread of the disease. However, the molecular pathogenesis of the virus and related diseases has still not been fully revealed. In this study, we attempted to identify new transcriptomic signatures as candidate diagnostic models for clinical testing or as therapeutic targets for vaccine design. Using the recently reported transcriptomics data of upper airway tissue with acute respiratory illnesses, we integrated multiple machine learning methods to identify effective qualitative biomarkers and quantitative rules for the distinction of SARS-CoV-2 infection from other infectious diseases. The transcriptomics data was first analyzed by Boruta so that important features were selected, which were further evaluated by the minimum redundancy maximum relevance method. A feature list was produced. This list was fed into the incremental feature selection, incorporating some classification algorithms, to extract qualitative biomarker genes and construct quantitative rules. Also, an efficient classifier was built to identify patients infected with SARS-COV-2. The findings reported in this study may help in revealing the potential pathogenic mechanisms of COVID-19 and finding new targets for vaccine design.
2019年全球冠状病毒病(COVID-19)大流行是由一种名为严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的新型冠状病毒传播引发的。该疾病传播后,立即开展了多项关于SARS-CoV-2发病机制的研究。然而,该病毒及其相关疾病的分子发病机制仍未完全揭示。在本研究中,我们试图识别新的转录组特征,作为临床检测的候选诊断模型或疫苗设计的治疗靶点。利用最近报道的急性呼吸道疾病上呼吸道组织的转录组学数据,我们整合了多种机器学习方法,以识别区分SARS-CoV-2感染与其他传染病的有效定性生物标志物和定量规则。转录组学数据首先通过Boruta进行分析,以便选择重要特征,然后通过最小冗余最大相关性方法对这些特征进行进一步评估。生成了一个特征列表。该列表被输入到增量特征选择中,并结合一些分类算法,以提取定性生物标志物基因并构建定量规则。此外,还构建了一个高效的分类器来识别感染SARS-CoV-2的患者。本研究报告的结果可能有助于揭示COVID-19的潜在致病机制,并为疫苗设计找到新的靶点。