Lopes Anne, Tavares Paulo, Petit Marie-Agnès, Guérois Raphaël, Zinn-Justin Sophie
CEA, iBiTecS, Gif-sur-Yvette, F-91191 Paris, France.
BMC Genomics. 2014 Nov 27;15(1):1027. doi: 10.1186/1471-2164-15-1027.
The genetic diversity observed among bacteriophages remains a major obstacle for the identification of homologs and the comparison of their functional modules. In the structural module, although several classes of homologous proteins contributing to the head and tail structure can be detected, proteins of the head-to-tail connection (or neck) are generally more divergent. Yet, molecular analyses of a few tailed phages belonging to different morphological classes suggested that only a limited number of structural solutions are used in order to produce a functional virion. To challenge this hypothesis and analyze proteins diversity at the virion neck, we developed a specific computational strategy to cope with sequence divergence in phage proteins. We searched for homologs of a set of proteins encoded in the structural module using a phage learning database.
We show that using a combination of iterative profile-profile comparison and gene context analyses, we can identify a set of head, neck and tail proteins in most tailed bacteriophages of our database. Classification of phages based on neck protein sequences delineates 4 Types corresponding to known morphological subfamilies. Further analysis of the most abundant Type 1 yields 10 Clusters characterized by consistent sets of head, neck and tail proteins. We developed Virfam, a webserver that automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. This server was tested against 624 new phages from the NCBI database. 93% of the tailed and unclassified phages could be assigned to our head-neck-tail based categories, thus highlighting the large representativeness of the identified virion architectures. Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.
Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a "head-neck-tail" based classification. It should enable analysis of large sets of phage genomes. In particular, it should contribute to the classification of the abundant unknown viruses found on assembled contigs of metagenomic samples.
噬菌体间观察到的遗传多样性仍是鉴定同源物及其功能模块比较的主要障碍。在结构模块中,虽然可以检测到几类对头部和尾部结构有贡献的同源蛋白,但头-尾连接(或颈部)的蛋白通常差异更大。然而,对少数属于不同形态类别的有尾噬菌体的分子分析表明,为了产生功能性病毒体,仅使用了有限数量的结构解决方案。为了验证这一假设并分析病毒体颈部的蛋白质多样性,我们开发了一种特定的计算策略来应对噬菌体蛋白质中的序列差异。我们使用噬菌体学习数据库搜索结构模块中编码的一组蛋白质的同源物。
我们表明,通过结合迭代的profile-profile比较和基因背景分析,我们可以在数据库中的大多数有尾噬菌体中鉴定出一组头部、颈部和尾部蛋白质。基于颈部蛋白质序列对噬菌体进行分类可划分出4种类型,对应于已知的形态亚科。对最丰富的1型进行进一步分析产生了10个簇,其特征是头部、颈部和尾部蛋白质的一致集合。我们开发了Virfam,一个网络服务器,它可以自动识别噬菌体头-颈-尾模块的蛋白质,并将噬菌体分配到最密切相关的噬菌体簇中。该服务器针对来自NCBI数据库的624个新噬菌体进行了测试。93%的有尾且未分类的噬菌体可以被分配到我们基于头-颈-尾的类别中,从而突出了所鉴定的病毒体结构的巨大代表性。类型和簇划分出一致的尾病毒目亚组,这与几种病毒体特性相关。
我们的方法和网络服务器有能力自动对大多数有尾噬菌体进行分类,检测其结构模块,为其一组头部、颈部和尾部基因赋予功能,提供其形态亚型,并将这些噬菌体定位在基于“头-颈-尾”的分类中。它应该能够分析大量的噬菌体基因组。特别是,它应该有助于对宏基因组样本组装重叠群上发现的大量未知病毒进行分类。