Institutes of Biology and Medical Sciences, Medical College of Soochow University, Suzhou, 215123, China.
State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, 200433, China.
BMC Genomics. 2019 Nov 21;20(1):886. doi: 10.1186/s12864-019-6234-8.
The genome topology network (GTN) is a new approach for studying the phylogenetics of bacterial genomes by analysing their gene order. The previous GTN tool gives a phylogenetic tree and calculate the different degrees (DD) of various adjacent gene families with complete genome data, but it is limited to the gene family level.
In this study, we collected 51 published complete and draft group B Streptococcus (GBS) genomes from the NCBI database as the case study data. The phylogenetic tree obtained from the GTN method assigned the genomes into six main clades. Compared with single nucleotide polymorphism (SNP)-based method, the GTN method exhibited a higher resolution in two clades. The gene families located at unique node connections in these clades were associated with the clusters of orthologous groups (COG) functional categories of "[G] Carbohydrate transport and metabolism,", "[L] Replication, recombination, and repair" and "[J] translation, ribosomal structure and biogenesis". Thus, these genes were the major factors affecting the differentiation of these six clades in the phylogenetic tree obtained from the GTN.
The modified GTN analyzes draft genomic data and exhibits greater functionality than the previous version. The gene family clustering algorithm embedded in the GTN tool is optimized by introducing the Markov cluster algorithm (MCL) tool to assign genes to functional gene families. A bootstrap test is performed to verify the credibility of the clades when allowing users to adjust the relationships of the clades accordingly. The GTN tool gives additional evolutionary information that is a useful complement to the SNP-based method. Information on the differences in the connections between a gene and its adjacent genes in species or clades is easily obtained. The modified GTN tool can be downloaded from https://github.com/0232/Genome_topology_network.
基因组拓扑网络(GTN)是一种通过分析基因顺序来研究细菌基因组系统发育的新方法。以前的 GTN 工具提供了一个系统发育树,并计算了具有完整基因组数据的各种相邻基因家族的不同程度(DD),但它仅限于基因家族水平。
本研究以 51 株从 NCBI 数据库中收集的已发表的完整和草图 B 组链球菌(GBS)基因组作为案例研究数据。GTN 方法获得的系统发育树将基因组分为六个主要分支。与基于单核苷酸多态性(SNP)的方法相比,GTN 方法在两个分支中表现出更高的分辨率。这些分支中位于独特节点连接的基因家族与同源群(COG)功能类别的“[G]碳水化合物运输和代谢”、“[L]复制、重组和修复”以及“[J]翻译、核糖体结构和生物发生”相关。因此,这些基因是影响 GTN 获得的系统发育树中这六个分支分化的主要因素。
修改后的 GTN 分析草图基因组数据,比以前的版本具有更大的功能。嵌入 GTN 工具的基因家族聚类算法通过引入马尔可夫聚类算法(MCL)工具来分配基因到功能基因家族,得到了优化。当允许用户相应地调整分支的关系时,进行了自举测试以验证分支的可信度。GTN 工具提供了额外的进化信息,是 SNP 方法的有用补充。很容易获得物种或分支中基因与其相邻基因之间连接差异的信息。修改后的 GTN 工具可从 https://github.com/0232/Genome_topology_network 下载。