Maruyama Osamu, Matsuda Akiko, Kuhara Satoru
Int J Bioinform Res Appl. 2005;1(4):429-46. doi: 10.1504/IJBRA.2005.008446.
In this paper, we propose a method for reconstructing phylogenetic trees of a given set of prokaryote organisms by randomly sampling relatively small oligopeptides of a fixed length from their complete proteomes. For each of the organisms, a vector of frequencies of the sampled oligopeptides is generated and used as a building block in reconstructing phylogenetic trees. By this procedure, multiple phylogenetic trees are created independently, and a consensus tree of those trees is created. We have applied our method to a set of 109 organisms, including 16 Archaea, 87 Bacteria, and 6 Eukarya, using around 10% of all the 3,200,000 oligopeptides of length 5 in a reconstruction of a single phylogenetic tree. Our consensus tree agrees with the tree of Bergey's Manual in most of the basic taxa. In addition, they have almost the same quality as the trees of the same organisms reconstructed using all the 20K oligopeptides of length K = 5 and 6 given by Qi et al. Thus we can conclude that, the frequencies of a relatively small number of oligopeptides of length 5, even if those oligopeptides are determined in a random method, has phylogenetic information almost equivalent to the frequencies of all the oligopeptides of length 5 or 6.
在本文中,我们提出了一种方法,通过从给定的一组原核生物的完整蛋白质组中随机抽取固定长度的相对较短的寡肽,来重建这些原核生物的系统发育树。对于每个生物体,生成一个采样寡肽频率的向量,并将其用作重建系统发育树的构建模块。通过这个过程,独立创建多个系统发育树,并创建这些树的一致树。我们已将我们的方法应用于一组109个生物体,包括16个古生菌、87个细菌和6个真核生物,在单个系统发育树的重建中使用了长度为5的所有320万个寡肽中的约10%。我们的一致树在大多数基本分类单元上与《伯杰氏手册》中的树一致。此外,它们与Qi等人使用长度K = 5和6的所有20^K个寡肽重建的相同生物体的树几乎具有相同的质量。因此我们可以得出结论,即使长度为5的相对少量寡肽是以随机方法确定的,其频率也具有几乎等同于长度为5或6的所有寡肽频率的系统发育信息。