Suppr超能文献

使用机器学习方法识别与新冠病毒疾病严重程度相关的严重急性呼吸综合征冠状病毒2突变

Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method.

作者信息

Huang Feiming, Chen Lei, Guo Wei, Zhou Xianchao, Feng Kaiyan, Huang Tao, Cai Yudong

机构信息

School of Life Sciences, Shanghai University, Shanghai 200444, China.

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.

出版信息

Life (Basel). 2022 May 28;12(6):806. doi: 10.3390/life12060806.

Abstract

SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.

摘要

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)在传播过程中通过高频基因组变异展现出强大的进化能力。进化后的SARS-CoV-2常常表现出对先前疫苗的抗性,并可导致患者临床状况不佳。SARS-CoV-2基因组中的突变涉及结构蛋白和非结构蛋白的突变,其中一些蛋白如刺突蛋白已被证明与重症冠状病毒病2019(COVID-19)肺炎患者的临床状况直接相关。在本研究中,我们收集了不同临床状况患者的毒株全基因组突变信息以及COVID-19肺炎的严重程度。使用机器学习方法提取重要的蛋白质突变和非翻译区突变。首先,通过博鲁塔算法和四种排序算法(最小绝对收缩和选择算子、轻梯度提升机、最大相关最小冗余以及蒙特卡罗特征选择),筛选出与患者临床状况高度相关的突变,并将其归入四个特征列表中进行排序。一些突变如D614G和V1176F被证明与病毒传染性有关。此外,还鉴定出了先前未报道的突变,如非结构蛋白14(nsp14)的A320V和nsp14的I164ILV,这表明了它们的潜在作用。然后,我们将增量特征选择方法应用于每个特征列表以构建高效分类器,该分类器可直接用于区分COVID-19患者的临床状况。同时,建立了四套定量规则,这有助于我们更直观地理解每个突变在区分COVID-19患者临床状况中的作用。确定与病毒学特性相关的关键突变将有助于更好地理解感染机制,并有助于开发抗病毒治疗方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验