Merkl Rainer, Wiezer Arnim
Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, 93040, Regensburg, Germany.
J Mol Evol. 2009 May;68(5):550-62. doi: 10.1007/s00239-009-9233-6. Epub 2009 May 13.
Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replacement of genes via horizontal gene transfer may give misleading results. GO4genome is a new paradigm, which is based on a detailed analysis of gene function and the location of the respective genes. For characterization of genes, the algorithm uses gene ontology enabling a comparison of function independent of evolutionary relationship. After the identification of locally optimal series of gene functions, their length distribution is utilized to compute a phylogenetic distance. The outcome is a classification of genomes based on metabolic capabilities and their organization. Thus, the impact of effects on genome organization that are not covered by methods of molecular phylogeny can be studied. Genomes of strains belonging to Escherichia coli, Shigella, Streptococcus, Methanosarcina, and Yersinia were analyzed. Differences from the findings of classical methods are discussed.
确定亲缘关系较近的原核生物的系统发育在对rRNA或一小部分序列进行分析时可能会失败。全基因组系统发育利用了最大可用样本空间。为了精确确定基因组相似性,在开发全基因组系统发育算法时必须考虑两个方面:(1)基因顺序保守性比基因含量是更精确的信号;(2)在使用序列相似性时,识别直系同源基因失败或通过水平基因转移进行基因原位替换可能会给出误导性结果。GO4genome是一种新的范例,它基于对基因功能和各个基因位置的详细分析。对于基因表征,该算法使用基因本体论,从而能够独立于进化关系对功能进行比较。在识别出局部最优的基因功能系列后,利用它们的长度分布来计算系统发育距离。结果是基于代谢能力及其组织对基因组进行分类。因此,可以研究分子系统发育方法未涵盖的对基因组组织的影响。分析了属于大肠杆菌、志贺氏菌、链球菌、甲烷八叠球菌和耶尔森氏菌的菌株的基因组。讨论了与经典方法结果的差异。