Wang Minglei, Caetano-Anollés Gustavo
Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
Mol Biol Evol. 2006 Dec;23(12):2444-54. doi: 10.1093/molbev/msl117. Epub 2006 Sep 13.
The majority of proteins consist of multiple domains that are either repeated or combined in defined order. In this study, we survey the combination of protein domains defined at fold and fold superfamily levels in 185 genomes belonging to organisms that have been fully sequenced and introduce a method that reconstructs rooted phylogenomic trees from the content and arrangement of domains in proteins at a genomic level. We find that the majority of domain combinations were unique to Archaea, Bacteria, or Eukarya, suggesting most combinations originated after life had diversified. Domain repeat and domain repeat within multidomain proteins increased notably in eukaryotes, mainly at the expense of single-domain and domain-pair proteins. This increase was mostly confined to Metazoa. We also find an unbalanced sharing of domain combinations which suggests that Eukarya is more closely related to Bacteria than to Archaea, an observation that challenges the widely assumed eukaryote-archaebacterial sisterhood relationship. The occurrence and abundance of the molecular repertoire (interactome) of domain combinations was used to generate phylogenomic trees. These global interactome-based phylogenies described organismal histories satisfactorily, revealing the tripartite nature of life, and supporting controversial evolutionary patterns, such as the Coelomata hypothesis, the grouping of plants and animals, and the Gram-positive origin of bacteria. Results suggest strongly that the process of domain combination is not random but curved by evolution, rejecting the null hypothesis of domain modules combining in the absence of natural selection or an optimality criterion.
大多数蛋白质由多个结构域组成,这些结构域以特定顺序重复或组合。在本研究中,我们调查了185个已完成全基因组测序的生物体基因组中,处于折叠和折叠超家族水平定义的蛋白质结构域的组合情况,并引入了一种方法,该方法可根据基因组水平上蛋白质结构域的内容和排列来重建有根系统发育树。我们发现,大多数结构域组合是古菌、细菌或真核生物所特有的,这表明大多数组合是在生命分化之后产生的。真核生物中多结构域蛋白内的结构域重复和结构域重复显著增加,主要是以单结构域和结构域对蛋白为代价。这种增加主要局限于后生动物。我们还发现结构域组合的共享不均衡,这表明真核生物与细菌的关系比与古菌的关系更密切,这一观察结果挑战了广泛假定的真核生物-古细菌姐妹关系。利用结构域组合的分子库(相互作用组)的出现情况和丰度来生成系统发育树。这些基于全球相互作用组的系统发育树令人满意地描述了生物历史,揭示了生命的三方性质,并支持了一些有争议的进化模式,如体腔动物假说、动植物的分组以及细菌的革兰氏阳性起源。结果强烈表明,结构域组合的过程不是随机的,而是受进化影响的,这否定了在没有自然选择或最优标准的情况下结构域模块组合的零假设。