Saif Rashid, Mahmood Tania, Ejaz Aniqa, Zia Saeeda, Qureshi Abdul Rasheed
Decode Genomics, 323-D, Punjab University Employees Housing Scheme (II), Lahore, Pakistan.
Department of Sciences and Humanities, National University of Computer and Emerging Sciences, Lahore, Pakistan.
Gene Rep. 2021 Jun;23:101139. doi: 10.1016/j.genrep.2021.101139. Epub 2021 Apr 15.
Initially submitted 784 SARS-nCoV2 whole genome sequences on NCBI Virus database were selected for phylogenetic analysis to look into their similarities with two of Pakistani sequenced coronavirus strains having accessions of MT240479 and MT262993. The MT240479 named (Gilgit1-Pak) was found in close proximity to MT184913 named (CruiseA-USA), while MT262993 named (Manga-Pak) was in neighboring to MT039887 named (WI-USA) strain, which were further chosen for variant calling analysis along with reference genome NC_045512 as out-group to construct concluding cladogram and looked for evolutionary distance with PAUP software in this article. Aforementioned Pakistani strains each of having 29,836 bases were compared with MT263429 (WI-USA) of 29,889 bases and MT259229 (Wuhan-P.R. China) of 29,864 bases. Whole genome variant calling pipeline revealed 31 variants in both Pakistani strains collectively (Manga-Pak vs USA having 2del & 7SNPs, while different from Chinese strain with 2del & 2SNPs, similarly Gilgit1-Pak vs USA having 10SNPs, while different from Chinese strains having 8SNPs). These variants harbour , and genes having their role is viral replication/translation, host innate immunity and viral capsid formation respectively. These novel variants may be one of the reasons for low mortality rate in Pakistan with 385 deaths as compared to USA with 63,871 and P.R. China with 4633 by May 01, 2020. However functional characterization of these variants and their integrations with other viral proteins including variability of human receptors (ACE2 & NRP1) may be the other reasons for unlikely COVID-19 statistics in Pakistan which need further confirmatory studies. Moreover, mutated N and ORF1a proteins in Pakistani strains were also analyzed by 3D structure modeling, which give another dimension of comparing these alterations at amino acid level. In a nutshell, these novel variants are correlated with reduced mortality of COVID-19 severity in Pakistan while more robust results can be obtained by wet lab experimentation. This also gives insight of genomic landscape of these indigenous strains to develop diagnostics kits, vaccines and therapeutic interventions.
最初在NCBI病毒数据库中提交的784个SARS-CoV-2全基因组序列被选用于系统发育分析,以研究它们与巴基斯坦两个已测序的冠状病毒菌株(登录号分别为MT240479和MT262993)的相似性。名为(Gilgit1-Pak)的MT240479与名为(CruiseA-USA)的MT184913距离较近,而名为(Manga-Pak)的MT262993与名为(WI-USA)的MT039887菌株相邻,本文将它们与参考基因组NC_045512一起进一步选用于变异位点分析,以构建最终的系统发育树,并使用PAUP软件寻找进化距离。将上述每个含有29,836个碱基的巴基斯坦菌株与含有29,889个碱基的MT263429(WI-USA)和含有29,864个碱基的MT259229(中国武汉)进行比较。全基因组变异位点分析流程显示,两个巴基斯坦菌株总共存在31个变异(Manga-Pak与美国菌株相比有2个缺失和7个单核苷酸多态性,与中国菌株相比有2个缺失和2个单核苷酸多态性;类似地 Gilgit1-Pak与美国菌株相比有10个单核苷酸多态性,与中国菌株相比有8个单核苷酸多态性)。这些变异存在于 、 和 基因中,它们分别在病毒复制/翻译、宿主固有免疫和病毒衣壳形成中发挥作用。这些新变异可能是巴基斯坦死亡率较低的原因之一,截至2020年5月1日,巴基斯坦有385人死亡,而美国有63,871人死亡,中国有4,633人死亡。然而,这些变异的功能特性以及它们与其他病毒蛋白的整合,包括人类受体(ACE2和NRP1)的变异性,可能是巴基斯坦COVID-19统计数据不太可能出现这种情况的其他原因,这需要进一步的验证研究。此外,还通过三维结构建模分析了巴基斯坦菌株中突变的N蛋白和ORF1a蛋白,这为在氨基酸水平上比较这些变化提供了另一个维度。简而言之,这些新变异与巴基斯坦COVID-19严重程度的死亡率降低相关,而通过湿实验室实验可以获得更可靠的结果。这也为开发诊断试剂盒、疫苗和治疗干预措施提供了这些本土菌株的基因组概况。