Department of Chemistry, University of California, Berkeley, CA 94720, USA.
Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):133-8. doi: 10.1073/pnas.0913033107. Epub 2009 Dec 14.
We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously "unclassified" genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.
我们提出了一个通过比较全蛋白质组特征频率分布(FFP)构建的原核生物全蛋白质组系统发育。特征是氨基酸的 l- mers,每个生物体都由所有特征频率的分布来表示。特征长度的选择在 FFP 方法中至关重要,我们已经开发了一种用于识别推断原核生物系统发育(严格来说是蛋白质组系统发育)的最佳特征长度的程序。我们的 FFP 树是由 884 个原核生物、16 个单细胞真核生物和 2 个随机序列的全蛋白质组构建的。为了突出主要群体的分支顺序,我们展示了一个简化的单系类或门的蛋白质组 FFP 树,带有分支支持。在我们的全蛋白质组 FFP 树中:(i)古菌、细菌、真核生物和一个随机序列外群清晰分离;(ii)当以随机序列为根时,古菌和细菌形成姐妹群;(iii)具有细胞内膜隔室的盘基网柄菌位于细菌域的基部位置;(iv)在大多数分类水平上,原核生物中的几乎所有群体都是单系的,但在蛋白质组 FFP 树和其他方法构建的树中,主要群体的分支顺序存在许多差异;(v)以前“未分类”的基因组可能被分配到最可能的分类单元。我们描述了我们的 FFP 树与基于其他方法的树在原核生物分组和系统发育方面的显著相似性和差异。