Institute for Chemistry and Biology of the Marine Environment, Carl-von-Ossietzky-Str. 9-11, 26111 Oldenburg, Germany.
Viruses. 2023 Apr 19;15(4):1007. doi: 10.3390/v15041007.
Recent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called "megataxonomy of viruses", recognizes six different viral realms, defined based on the presence of viral hallmark genes (VHGs). Within the realms, viruses are classified into hierarchical taxons, ideally defined by the phylogeny of their shared genes. To enable the detection of shared genes, viruses have first to be clustered, and there is currently a need for tools to assist with virus clustering and classification. Here, VirClust is presented. It is a novel, reference-free tool capable of performing: (i) protein clustering, based on BLASTp and Hidden Markov Models (HMMs) similarities; (ii) hierarchical clustering of viruses based on intergenomic distances calculated from their shared protein content; (iii) identification of core proteins and (iv) annotation of viral proteins. VirClust has flexible parameters both for protein clustering and for splitting the viral genome tree into smaller genome clusters, corresponding to different taxonomic levels. Benchmarking on a phage dataset showed that the genome trees produced by VirClust match the current ICTV classification at family, sub-family and genus levels. VirClust is freely available, as a web-service and stand-alone tool.
近年来,病毒的分类标准和分类学发生了重大变化。目前的分类方案,也称为“病毒的巨型分类学”,基于病毒特征基因(VHGs)的存在,识别出六个不同的病毒领域。在这些领域中,病毒被分类为层次分类群,理想情况下由其共享基因的系统发育来定义。为了能够检测到共享基因,首先必须对病毒进行聚类,目前需要工具来协助病毒聚类和分类。在这里,我们提出了 VirClust。这是一种新颖的、无参考的工具,能够执行:(i)基于 BLASTp 和隐马尔可夫模型(HMMs)相似性的蛋白质聚类;(ii)基于从共享蛋白质内容计算的基因组间距离对病毒进行层次聚类;(iii)鉴定核心蛋白质和(iv)注释病毒蛋白质。VirClust 为蛋白质聚类和将病毒基因组树划分为更小的基因组聚类提供了灵活的参数,对应于不同的分类水平。在噬菌体数据集上的基准测试表明,VirClust 生成的基因组树与 ICTV 在科、亚科和属水平上的分类相匹配。VirClust 作为一个网络服务和独立的工具,是免费提供的。