Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, West Bengal, India.
Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.
Brief Bioinform. 2021 Mar 22;22(2):1106-1121. doi: 10.1093/bib/bbab025.
Whole genome analysis of SARS-CoV-2 is important to identify its genetic diversity. Moreover, accurate detection of SARS-CoV-2 is required for its correct diagnosis. To address these, first we have analysed publicly available 10 664 complete or near-complete SARS-CoV-2 genomes of 73 countries globally to find mutation points in the coding regions as substitution, deletion, insertion and single nucleotide polymorphism (SNP) globally and country wise. In this regard, multiple sequence alignment is performed in the presence of reference sequence from NCBI. Once the alignment is done, a consensus sequence is build to analyse each genomic sequence to identify the unique mutation points as substitutions, deletions, insertions and SNPs globally, thereby resulting in 7209, 11700, 119 and 53 such mutation points respectively. Second, in such categories, unique mutations for individual countries are determined with respect to other 72 countries. In case of India, unique 385, 867, 1 and 11 substitutions, deletions, insertions and SNPs are present in 566 SARS-CoV-2 genomes while 458, 1343, 8 and 52 mutation points in such categories are common with other countries. In majority (above 10%) of virus population, the most frequent and common mutation points between global excluding India and India are L37F, P323L, F506L, S507G, D614G and Q57H in NSP6, RdRp, Exon, Spike and ORF3a respectively. While for India, the other most frequent mutation points are T1198K, A97V, T315N and P13L in NSP3, RdRp, Spike and ORF8 respectively. These mutations are further visualised in protein structures and phylogenetic analysis has been done to show the diversity in virus genomes. Third, a web application is provided for searching mutation points globally and country wise. Finally, we have identified the potential conserved region as target that belongs to the coding region of ORF1ab, specifically to the NSP6 gene. Subsequently, we have provided the primers and probes using that conserved region so that it can be used for detecting SARS-CoV-2. Contact:indrajit@nitttrkol.ac.inSupplementary information: Supplementary data are available at http://www.nitttrkol.ac.in/indrajit/projects/COVID-Mutation-10K.
对 SARS-CoV-2 进行全基因组分析对于确定其遗传多样性很重要。此外,还需要准确检测 SARS-CoV-2 以进行正确诊断。为此,我们首先分析了全球 73 个国家/地区公开的 10664 个完整或近乎完整的 SARS-CoV-2 基因组,以在全球和国家范围内发现编码区域中的突变点,如替换、缺失、插入和单核苷酸多态性(SNP)。在这方面,在参考来自 NCBI 的序列的情况下进行多序列比对。完成比对后,构建共识序列以分析每个基因组序列,以在全球范围内识别替换、缺失、插入和 SNP 等独特的突变点,从而分别得到 7209、11700、119 和 53 个这样的突变点。其次,在这些类别中,确定了针对其他 72 个国家/地区的各个国家/地区的独特突变。就印度而言,在 566 个 SARS-CoV-2 基因组中存在 385、867、1 和 11 个独特的替换、缺失、插入和 SNP,而在这些类别中与其他国家/地区共有的 458、1343、8 和 52 个突变点。在病毒种群的大多数(超过 10%)中,在全球(不包括印度)和印度之间的最常见和最常见的突变点是 NSP6、RdRp、Exon、Spike 和 ORF3a 中的 L37F、P323L、F506L、S507G、D614G 和 Q57H。而对于印度,其他最常见的突变点是 NSP3、RdRp、Spike 和 ORF8 中的 T1198K、A97V、T315N 和 P13L。这些突变在蛋白质结构中进一步可视化,并进行了系统发育分析以显示病毒基因组的多样性。第三,提供了一个用于在全球和国家范围内搜索突变点的网络应用程序。最后,我们确定了作为 ORF1ab 编码区的靶标,特别是 NSP6 基因的潜在保守区域。随后,我们使用该保守区域提供了引物和探针,以便可以用于检测 SARS-CoV-2。联系人:indrajit@nitttrkol.ac.in 附加信息:补充数据可在 http://www.nitttrkol.ac.in/indrajit/projects/COVID-Mutation-10K 获得。