Forgia Marco, Chiapello M, Daghino Stefania, Pacifico D, Crucitti D, Oliva D, Ayllon M, Turina M, Turina M
Institute for Sustainable Plant Protection (IPSP), CNR, Strada delle Cacce 73, Torino 10135, Italy.
Institute of Biosciences and Bioresources (IBBR), CNR, Corso Calatafimi 414, Palermo 90129, Italy.
Virus Evol. 2022 Apr 23;8(1):veac038. doi: 10.1093/ve/veac038. eCollection 2022.
High throughput sequencing allowed the discovery of many new viruses and viral organizations increasing our comprehension of virus origin and evolution. Most RNA viruses are currently characterized through similarity searches of annotated virus databases. This approach limits the possibility to detect completely new virus-encoded proteins with no detectable similarities to existing ones, i.e. ORFan proteins. A strong indication of the ORFan viral origin in a metatranscriptome is the lack of DNA corresponding to an assembled RNA sequence in the biological sample. Furthermore, sequence homology among ORFans and evidence of co-occurrence of these ORFans in specific host individuals provides further indication of a viral origin. Here, we use this theoretical framework to report the finding of three conserved clades of protein-coding RNA segments without a corresponding DNA in fungi. Protein sequence and structural alignment suggest these proteins are distantly related to viral RNA-dependent RNA polymerases (RdRP). In these new putative viral RdRP clades, no GDD catalytic triad is present, but the most common putative catalytic triad is NDD and a clade with GDQ, a triad previously unreported at that site. SDD, HDD, and ADD are also represented. For most members of these three clades, we were able to associate a second genomic segment, coding for a protein of unknown function. We provisionally named this new group of viruses ormycovirus. Interestingly, all the members of one of these sub-clades (gammaormycovirus) accumulate more minus sense RNA than plus sense RNA during infection.
高通量测序使许多新病毒和病毒组织得以发现,增进了我们对病毒起源和进化的理解。目前,大多数RNA病毒是通过对注释病毒数据库进行相似性搜索来表征的。这种方法限制了检测与现有病毒编码蛋白完全没有可检测相似性的全新病毒编码蛋白(即孤儿蛋白)的可能性。在宏转录组中,孤儿病毒起源的一个有力迹象是生物样品中缺乏与组装后的RNA序列相对应的DNA。此外,孤儿蛋白之间的序列同源性以及这些孤儿蛋白在特定宿主个体中共存的证据进一步表明了其病毒起源。在此,我们利用这一理论框架报告在真菌中发现了三个保守的蛋白质编码RNA片段进化枝,它们没有对应的DNA。蛋白质序列和结构比对表明,这些蛋白质与病毒RNA依赖性RNA聚合酶(RdRP)有较远的亲缘关系。在这些新的假定病毒RdRP进化枝中,不存在GDD催化三联体,但最常见的假定催化三联体是NDD,还有一个进化枝含有GDQ,这是该位点之前未报道过的三联体。SDD、HDD和ADD也有出现。对于这三个进化枝的大多数成员,我们能够关联到第二个基因组片段,其编码一种功能未知的蛋白质。我们暂时将这组新病毒命名为真菌病毒。有趣的是,其中一个亚进化枝(γ真菌病毒)的所有成员在感染过程中积累的负链RNA比正链RNA更多。