Lima-Mendez Gipsi, Van Helden Jacques, Toussaint Ariane, Leplae Raphaël
Service de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium.
Mol Biol Evol. 2008 Apr;25(4):762-77. doi: 10.1093/molbev/msn023. Epub 2008 Jan 29.
Bacteriophage genomes show pervasive mosaicism, indicating the importance of horizontal gene exchange in their evolution. Phage genomes represent unique combinations of modules, each of them with a different phylogenetic history. The traditional classification, based on a variety of criteria such as nucleic acid type (single/double-stranded DNA/RNA), morphology, and host range, appeared inconsistent with sequence analyses. With the genomic era, an ever increasing number of sequenced phages cannot be classified, in part due to a lack of morphological information and in part to the intrinsic incapability of tree-based methods to efficiently deal with mosaicism. This problem led some virologists to call for a moratorium on the creation of additional taxa in the order Caudovirales, in order to let virologists discuss classification schemes that might better suit phage evolution. In this context, we propose a framework for a reticulate classification of phages based on gene content. Starting from gene families, we built a weighted graph, where nodes represent phages and edges represent phage-phage similarities in terms of shared genes. We then apply various measures of graph topology to analyze the resulting graph. Most double-stranded DNA phages are found in a single component. The values of the clustering coefficient and closeness distinguish temperate from virulent phages, whereas chimeric phages are characterized by a high betweenness coefficient. We apply a 2-step clustering method to this graph to generate a reticulate classification of phages: Each phage is associated with a membership vector, which quantitatively characterizes its membership to the set of clusters. Furthermore, we cluster genes based on their "phylogenetic profiles" to define "evolutionary cohesive modules." In virulent phages, evolutionary modules span several functional categories, whereas in temperate phages they correspond better to functional modules. Moreover, despite the fact that modules only cover a fraction of all phage genes, phage groups can be distinguished by their different combination of modules, serving the bases for a higher level reticulate classification. These 2 classification schemes provide an automatic and dynamic way of representing the relationships within the phage population and can be extended to include newly sequenced phage genomes, as well as other types of genetic elements.
噬菌体基因组呈现出普遍的镶嵌性,这表明水平基因转移在其进化过程中具有重要意义。噬菌体基因组代表了模块的独特组合,每个模块都有不同的系统发育历史。基于核酸类型(单链/双链DNA/RNA)、形态和宿主范围等多种标准的传统分类方法,似乎与序列分析不一致。随着基因组时代的到来,越来越多已测序的噬菌体无法进行分类,部分原因是缺乏形态学信息,部分原因是基于树的方法本质上无法有效处理镶嵌性。这个问题导致一些病毒学家呼吁暂停在有尾噬菌体目中创建新的分类单元,以便让病毒学家讨论可能更适合噬菌体进化的分类方案。在此背景下,我们提出了一个基于基因内容的噬菌体网状分类框架。从基因家族开始,我们构建了一个加权图,其中节点代表噬菌体,边代表噬菌体之间基于共享基因的相似性。然后,我们应用各种图拓扑度量来分析所得的图。大多数双链DNA噬菌体位于单个组件中。聚类系数和接近度的值区分了温和噬菌体和烈性噬菌体,而嵌合噬菌体的特征是具有较高的介数系数。我们对这个图应用两步聚类方法来生成噬菌体的网状分类:每个噬菌体都与一个成员向量相关联,该向量定量地描述了它属于聚类集的程度。此外,我们根据基因的“系统发育谱”对基因进行聚类,以定义“进化凝聚模块”。在烈性噬菌体中,进化模块跨越多个功能类别,而在温和噬菌体中,它们与功能模块的对应性更好。此外,尽管模块只覆盖了所有噬菌体基因的一部分,但噬菌体群体可以通过其不同的模块组合来区分,这为更高层次的网状分类奠定了基础。这两种分类方案提供了一种自动且动态的方式来表示噬菌体群体内部的关系,并且可以扩展到包括新测序的噬菌体基因组以及其他类型的遗传元件。