Department of Microbiome Science, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia.
Viruses. 2021 Jun 18;13(6):1164. doi: 10.3390/v13061164.
Viruses, far from being just parasites affecting hosts' fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.
病毒远非仅仅是影响宿主适应性的寄生虫,它们还是任何微生物生态系统中的主要参与者。尽管病毒(特别是噬菌体)广泛存在,但由于从病毒群落 DNA 调查中获得的序列中只有约 20%可以通过与公共数据库的比较进行注释,因此它们在很大程度上仍然不为人知。为了阐明这一遗传暗物质,我们将搜索与细菌噬菌体的病毒分类学相关的同源群作为潜在标记进行了扩展,并纳入了真核病毒,建立了一套包含 31150 个 ViPhOGs(真核病毒和噬菌体同源群)的集合。为此,我们检查了公共数据库中存储的非冗余病毒多样性,预测了缺乏此类信息的基因组中的蛋白质,并使用所有注释和预测的蛋白质来识别潜在的蛋白质结构域。使用 cogSoft 将结构域和未注释区域聚类为同源群。最后,我们采用随机森林实现对基因组进行分类,并发现 ViPhOGs 的存在与否与它们的分类学显著相关。此外,我们建立了一套包含 1457 个 ViPhOGs 的集合,这些 ViPhOGs 因其对分类的重要性,可被视为 ICTV 在目、科和属级别定义的不同分类群的标记或特征。
Viruses. 2021-6-18
J Bacteriol. 2012-12-7
Arch Virol. 2018-8
Proc Natl Acad Sci U S A. 2024-11-5
Nat Microbiol. 2020-4-27
F1000Res. 2018-11-22
mBio. 2018-11-27
J Gen Virol. 2017-12
Arch Virol. 2016-8
FEMS Microbiol Lett. 2016-5