National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA.
Center for Information Technology, National Institutes of Health, Bethesda, Maryland, USA.
mBio. 2023 Apr 25;14(2):e0040823. doi: 10.1128/mbio.00408-23. Epub 2023 Apr 5.
Viruses with large, double-stranded DNA genomes captured the majority of their genes from their hosts at different stages of evolution. The origins of many virus genes are readily detected through significant sequence similarity with cellular homologs. In particular, this is the case for virus enzymes, such as DNA and RNA polymerases or nucleotide kinases, that retain their catalytic activity after capture by an ancestral virus. However, a large fraction of virus genes have no readily detectable cellular homologs, meaning that their origins remain enigmatic. We explored the potential origins of such proteins that are encoded in the genomes of orthopoxviruses, a thoroughly studied virus genus that includes major human pathogens. To this end, we used AlphaFold2 to predict the structures of all 214 proteins that are encoded by orthopoxviruses. Among the proteins of unknown provenance, structure prediction yielded clear indications of origin for 14 of them and validated several inferences that were previously made via sequence analysis. A notable emerging trend is the exaptation of enzymes from cellular organisms for nonenzymatic, structural roles in virus reproduction that is accompanied by the disruption of catalytic sites and by an overall drastic divergence that precludes homology detection at the sequence level. Among the 16 orthopoxvirus proteins that were found to be inactivated enzyme derivatives are the poxvirus replication processivity factor A20, which is an inactivated NAD-dependent DNA ligase; the major core protein A3, which is an inactivated deubiquitinase; F11, which is an inactivated prolyl hydroxylase; and more similar cases. For nearly one-third of the orthopoxvirus virion proteins, no significantly similar structures were identified, suggesting exaptation with subsequent major structural rearrangement that yielded unique protein folds. Protein structures are more strongly conserved in evolution than are amino acid sequences. Comparative structural analysis is particularly important for inferring the origins of viral proteins that typically evolve at high rates. We used a powerful protein structure modeling method, namely, AlphaFold2, to model the structures of all orthopoxvirus proteins and compared them to all available protein structures. Multiple cases of recruitment of host enzymes for structural roles in viruses, accompanied by the disruption of catalytic sites, were discovered. However, many viral proteins appear to have evolved unique structural folds.
具有大型双链 DNA 基因组的病毒在进化的不同阶段从其宿主中捕获了大部分基因。通过与细胞同源物的显著序列相似性,很容易检测到许多病毒基因的起源。特别是,这种情况适用于病毒酶,例如 DNA 和 RNA 聚合酶或核苷酸激酶,它们在被祖先病毒捕获后保留了催化活性。然而,很大一部分病毒基因没有可检测到的细胞同源物,这意味着它们的起源仍然是个谜。我们探索了在正痘病毒基因组中编码的此类蛋白质的潜在起源,正痘病毒是一种经过充分研究的病毒属,其中包括主要的人类病原体。为此,我们使用 AlphaFold2 预测了正痘病毒编码的 214 种蛋白质的结构。在未知来源的蛋白质中,结构预测为其中 14 种提供了明确的起源迹象,并验证了先前通过序列分析得出的几个推论。一个显著的新兴趋势是,细胞生物的酶被用于病毒繁殖的非酶、结构作用,同时破坏了催化位点,并导致整体剧烈的分歧,从而无法在序列水平上检测到同源性。在被发现是失活酶衍生物的 16 种正痘病毒蛋白中,有痘病毒复制过程因子 A20,它是一种失活的 NAD 依赖性 DNA 连接酶;主要核心蛋白 A3,它是一种失活的去泛素酶;F11,它是一种失活的脯氨酰羟化酶;以及更多类似的情况。对于近三分之一的正痘病毒病毒蛋白,没有发现明显相似的结构,这表明存在适应现象,随后发生了主要的结构重排,产生了独特的蛋白质折叠。与氨基酸序列相比,蛋白质结构在进化中更具保守性。比较结构分析对于推断通常以高速度进化的病毒蛋白的起源尤为重要。我们使用了一种强大的蛋白质结构建模方法,即 AlphaFold2,来模拟所有正痘病毒蛋白的结构,并将它们与所有可用的蛋白质结构进行比较。发现了多种宿主酶被招募用于病毒的结构作用的情况,同时破坏了催化位点。然而,许多病毒蛋白似乎已经进化出独特的结构折叠。