Committee on Evolutionary Biology, The University of Chicago, Chicago, IL, USA.
BMC Genomics. 2011 May 12;12:237. doi: 10.1186/1471-2164-12-237.
The explosion in availability of whole genome data provides the opportunity to build phylogenetic hypotheses based on these data as well as the ability to learn more about the genomes themselves. The biological history of genes and genomes can be investigated based on the taxomonic history provided by the phylogeny. A phylogenetic hypothesis based on complete genome data is presented for the genus Shewanella (Gammaproteobacteria: Alteromonadales: Shewanellaceae). Nineteen taxa from Shewanella (16 species and 3 additional strains of one species) as well as three outgroup species representing the genera Aeromonas (Gammaproteobacteria: Aeromonadales: Aeromonadaceae), Alteromonas (Gammaproteobacteria: Alteromonadales: Alteromonadaceae) and Colwellia (Gammaproteobacteria: Alteromonadales: Colwelliaceae) are included for a total of 22 taxa.
Putatively homologous regions were found across unannotated genomes and tested with a phylogenetic analysis. Two genome-wide data-sets are considered, one including only those genomic regions for which all taxa are represented, which included 3,361,015 aligned nucleotide base-pairs (bp) and a second that additionally includes those regions present in only subsets of taxa, which totaled 12,456,624 aligned bp. Alignment columns in these large data-sets were then randomly sampled to create smaller data-sets. After the phylogenetic hypothesis was generated, genome annotations were projected onto the DNA sequence alignment to compare the historical hypothesis generated by the phylogeny with the functional hypothesis posited by annotation.
Individual phylogenetic analyses of the 243 locally co-linear genome regions all failed to recover the genome topology, but the smaller data-sets that were random samplings of the large concatenated alignments all produced the genome topology. It is shown that there is not a single orthologous copy of 16S rRNA across the taxon sampling included in this study and that the relationships among the multiple copies are consistent with 16S rRNA undergoing concerted evolution. Unannotated whole genome data can provide excellent raw material for generating hypotheses of historical homology, which can be tested with phylogenetic analysis and compared with hypotheses of gene function.
全基因组数据的可用性呈爆炸式增长,这为基于这些数据构建系统发育假说以及了解基因组本身提供了机会。基因和基因组的生物历史可以根据系统发育提供的分类历史进行研究。本文提出了一个基于完整基因组数据的希瓦氏菌属(γ-变形菌门:交替单胞菌目:希瓦氏菌科)的系统发育假说。该研究包括了希瓦氏菌属的 19 个分类群(16 个物种和 1 个物种的 3 个附加菌株)以及 3 个外群物种,分别代表气单胞菌属(γ-变形菌门:气单胞菌目:气单胞菌科)、交替单胞菌属(γ-变形菌门:交替单胞菌目:交替单胞菌科)和柯林斯氏菌属(γ-变形菌门:交替单胞菌目:柯林斯氏菌科),共 22 个分类群。
在未注释的基因组中发现了假定的同源区域,并进行了系统发育分析测试。研究考虑了两种全基因组数据集,一种仅包括所有分类群都有代表的基因组区域,共包含 3361015 个对齐核苷酸碱基对(bp),另一种还包括仅在部分分类群中存在的区域,总计 12456624 个对齐 bp。在这些大数据集中,对列进行了随机抽样,以创建较小的数据集。在生成系统发育假说后,将基因组注释投射到 DNA 序列比对上,以比较系统发育生成的历史假说与注释提出的功能假说。
对 243 个局部共线性基因组区域的个体系统发育分析都未能恢复基因组拓扑结构,但对大型串联比对进行随机抽样的较小数据集都产生了基因组拓扑结构。结果表明,在本研究中包含的分类群采样中没有一个 16S rRNA 的单一直系同源拷贝,并且多个拷贝之间的关系与 16S rRNA 经历协同进化一致。未注释的全基因组数据可以为生成历史同源性假说提供极好的原始材料,可以通过系统发育分析进行测试,并与基因功能假说进行比较。