3780 Pelham Drive, Mobile, AL 36619, USA.
Infect Genet Evol. 2021 Aug;92:104858. doi: 10.1016/j.meegid.2021.104858. Epub 2021 Apr 18.
The coronaviruses (CoVs), including SARS-CoV-2, the agent of the ongoing deadly CoVID-19 pandemic (Coronavirus disease-2019), represent a highly complex and diverse class of RNA viruses with large genomes, complex gene repertoire, and intricate transcriptional and translational mechanisms. The 3'-terminal one-third of the genome encodes four structural proteins, namely spike, envelope, membrane, and nucleocapsid, interspersed with genes for accessory proteins that are largely nonstructural and called 'open reading frame' (ORF) proteins with alphanumerical designations, but not in a consistent or sequential order. Here, I report a comparative study of these ORF proteins, mainly encoded in two gene clusters, i.e. between the Spike and the Envelope genes, and between the Membrane and the Nucleocapsid genes. For brevity and focus, a greater emphasis was placed on the first cluster, collectively designated as the 'orf3 region' for ease of referral. Overall, an apparently diverse set of ORFs, such as ORF3a, ORF3b, ORF3c, ORF3d, ORF4 and ORF5, but not necessarily numbered in that order on all CoV genomes, were analyzed along with other ORFs. Unexpectedly, the gene order or naming of the ORFs were never fully conserved even within the members of one Genus. These studies also unraveled hitherto unrecognized orf genes in alternative translational frames, encoding potentially novel polypeptides as well as some that are highly similar to known ORFs. Finally, several options of an inclusive and systematic numbering are proposed not only for the orf3 region but also for the other orf genes in the viral genome in an effort to regularize the apparently confusing names and orders. Regardless of the ultimate acceptability of one system over the others, this treatise is hoped to initiate an informed discourse in this area.
冠状病毒(CoVs),包括引发当前致命 COVID-19 大流行的 SARS-CoV-2 病毒,是一类具有高度复杂性和多样性的 RNA 病毒,具有庞大的基因组、复杂的基因组成以及复杂的转录和翻译机制。基因组的 3' 端三分之一编码四个结构蛋白,即刺突蛋白、包膜蛋白、膜蛋白和核衣壳蛋白,其间散布着大量非结构辅助蛋白基因,这些基因被称为“开放阅读框”(ORF)蛋白,它们以字母数字命名,但没有一致或连续的顺序。在这里,我报告了对这些 ORF 蛋白的比较研究,这些蛋白主要编码在两个基因簇中,即刺突蛋白和包膜蛋白基因之间,以及膜蛋白和核衣壳蛋白基因之间。为了简洁和聚焦,我更侧重于第一个基因簇,即通常称为“orf3 区”,以便于引用。总体而言,分析了一组明显多样化的 ORF,如 ORF3a、ORF3b、ORF3c、ORF3d、ORF4 和 ORF5,但并非所有 CoV 基因组都按此顺序编号,同时还分析了其他 ORF。出乎意料的是,即使在同一属的成员中,ORF 的基因顺序或命名也从未完全保守。这些研究还揭示了在替代翻译框架中尚未被识别的 orf 基因,这些基因编码潜在的新型多肽以及与已知 ORF 高度相似的多肽。最后,提出了几种包容性和系统性编号的选择方案,不仅适用于 orf3 区,也适用于病毒基因组中的其他 orf 基因,以努力规范明显混乱的名称和顺序。无论最终哪种系统更被接受,本文都希望能在这一领域引发有见地的讨论。