Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy.
Department of Physiopathology and Transplantation, University of Milan, Milan, Italy; Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy.
Infect Genet Evol. 2020 Sep;83:104353. doi: 10.1016/j.meegid.2020.104353. Epub 2020 May 5.
In December 2019, a novel human-infecting coronavirus (SARS-CoV-2) was recognized in China. In a few months, SARS-CoV-2 has caused thousands of disease cases and deaths in several countries. Phylogenetic analyses indicated that SARS-CoV-2 clusters with SARS-CoV in the Sarbecovirus subgenus and viruses related to SARS-CoV-2 were identified from bats and pangolins. Coronaviruses have long and complex genomes with high plasticity in terms of gene content. To date, the coding potential of SARS-CoV-2 remains partially unknown. We thus used available sequences of bat and pangolin viruses to determine the selective events that shaped the genome structure of SARS-CoV-2 and to assess its coding potential. By searching for signals of significantly reduced variability at synonymous sites (dS), we identified six genomic regions, one of these corresponding to the programmed -1 ribosomal frameshift. The most prominent signal of dS reduction was observed within the E gene. A genome-wide analysis of conserved RNA structures indicated that this region harbors a putative functional RNA element that is shared with the SARS-CoV lineage. Additional signals of reduced dS indicated the presence of internal ORFs. Whereas the presence ORF9a (internal to N) was previously proposed by homology with a well characterized protein of SARS-CoV, ORF3h (for hypothetical, within ORF3a) was not previously described. The predicted product of ORF3h has 90% identity with the corresponding predicted product of SARS-CoV and displays features suggestive of a viroporin. Finally, analysis of the putative ORF10 revealed high dN/dS (3.82) in SARS-CoV-2 and related coronaviruses. In the SARS-CoV lineage, the ORF is predicted to encode a truncated protein and is neutrally evolving. These data suggest that ORF10 encodes a functional protein in SARS-CoV-2 and that positive selection is driving its evolution. Experimental analyses will be necessary to validate and characterize the coding and non-coding functional elements we identified.
2019 年 12 月,一种新型的感染人类的冠状病毒(SARS-CoV-2)在中国被发现。在短短几个月内,SARS-CoV-2 已在多个国家导致数千例疾病和死亡。系统进化分析表明,SARS-CoV-2 与 SARS-CoV 同属于Sarbecovirus 亚属,在蝙蝠和穿山甲体内也发现了与 SARS-CoV-2 相关的病毒。冠状病毒具有长而复杂的基因组,在基因组成方面具有高度的可塑性。迄今为止,SARS-CoV-2 的编码潜力仍部分未知。因此,我们利用蝙蝠和穿山甲病毒的现有序列,确定了塑造 SARS-CoV-2 基因组结构的选择事件,并评估了其编码潜力。通过搜索同义位点(dS)变异明显减少的信号,我们鉴定了六个基因组区域,其中一个对应于程序性-1 核糖体移码。E 基因内观察到 dS 减少的最显著信号。对保守 RNA 结构的全基因组分析表明,该区域含有一个假定的功能性 RNA 元件,与 SARS-CoV 谱系共享。dS 减少的其他信号表明存在内部 ORF。虽然 ORF9a(位于 N 内)的存在以前是通过与 SARS-CoV 中一个特征明确的蛋白的同源性提出的,但 ORF3h(ORF3a 内的假定)以前并未描述过。ORF3h 的预测产物与 SARS-CoV 的相应预测产物具有 90%的同一性,并具有提示 viroporin 的特征。最后,对假定的 ORF10 分析表明,SARS-CoV-2 和相关冠状病毒中的 dN/dS(3.82)很高。在 SARS-CoV 谱系中,ORF 预测编码截短蛋白,并且呈中性进化。这些数据表明,ORF10 在 SARS-CoV-2 中编码功能性蛋白,并且正选择驱动其进化。需要进行实验分析来验证和表征我们鉴定的编码和非编码功能元件。