Xi Binbin, Jiang Dawei, Li Shuhua, Lon Jerome R, Bai Yunmeng, Lin Shudai, Hu Meiling, Meng Yuhuan, Qu Yimo, Huang Yuting, Liu Wei, Huang Lizhen, Du Hongli
School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China.
Comput Struct Biotechnol J. 2021;19:1976-1985. doi: 10.1016/j.csbj.2021.04.002. Epub 2021 Apr 5.
With the global epidemic of SARS-CoV-2, it is important to effectively monitor the variation, haplotype subgroup epidemic trends and key mutations of SARS-CoV-2 over time. This is of great significance to the development of new vaccines, the update of therapeutic drugs, and the improvement of detection methods. The AutoVEM tool developed in the present study could complete all mutations detections, haplotypes classification, haplotype subgroup epidemic trends and candidate key mutations analysis for 131,576 SARS-CoV-2 genome sequences in 18 h on a 1 core CPU and 2 GB RAM computer. Through haplotype subgroup epidemic trends analysis of 131,576 genome sequences, the great significance of the previous 4 specific sites (C241T, C3037T, C14408T and A23403G) was further revealed, and 6 new mutation sites of highly linked (T445C, C6286T, C22227T, G25563T, C26801G and G29645T) were discovered for the first time that might be related to the infectivity, pathogenicity or host adaptability of SARS-CoV-2. In brief, we proposed an integrative method and developed an efficient automated tool to monitor haplotype subgroup epidemic trends and screen for the candidate key mutations in the evolution of SARS-CoV-2 over time for the first time, and all data could be updated quickly to track the prevalence of previous key mutations and new candidate key mutations because of high efficiency of the tool. In addition, the idea of combinatorial analysis in the present study can also provide a reference for the mutation monitoring of other viruses.
随着严重急性呼吸综合征冠状病毒2(SARS-CoV-2)在全球流行,有效监测SARS-CoV-2随时间的变异、单倍型亚组流行趋势及关键突变具有重要意义。这对于新型疫苗的研发、治疗药物的更新以及检测方法的改进都至关重要。本研究开发的AutoVEM工具在一台单核中央处理器(CPU)和2GB随机存取存储器(RAM)的计算机上,18小时内可完成对131,576条SARS-CoV-2基因组序列的所有突变检测、单倍型分类、单倍型亚组流行趋势及候选关键突变分析。通过对131,576条基因组序列的单倍型亚组流行趋势分析,进一步揭示了先前4个特定位点(C241T、C3037T、C14408T和A23403G)的重要意义,并首次发现6个高度连锁的新突变位点(T445C、C6286T、C22227T、G25563T、C26801G和G29645T),这些位点可能与SARS-CoV-2的传染性、致病性或宿主适应性有关。简而言之,我们首次提出了一种综合方法并开发了一种高效的自动化工具,用于监测SARS-CoV-2随时间演变过程中的单倍型亚组流行趋势并筛选候选关键突变,且由于该工具效率高,所有数据可快速更新以追踪先前关键突变和新候选关键突变的流行情况。此外,本研究中的组合分析思路也可为其他病毒的突变监测提供参考。