Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen 72076, Germany.
Protein Sci. 2010 Jan;19(1):124-30. doi: 10.1002/pro.297.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
许多蛋白质分类系统通过将结构域根据序列相似性分组为家族和超家族来捕捉同源关系。具有相似 3D 结构的超家族进一步分为折叠。在没有可辨别的序列相似性的情况下,这些结构相似性长期以来被认为是通过趋同进化独立起源的。然而,数据库的增长和序列比较方法的进步导致发现了许多超越超家族和折叠界限的遥远进化关系。为了研究趋同进化与发散进化在蛋白质折叠起源中的贡献,我们根据序列相似性对已知结构的代表性结构域进行聚类,将它们视为虚拟 2D 空间中的质点,根据它们的成对序列相似性相互吸引或排斥。正如预期的那样,同一超家族中的家族形成紧密的聚类。但是,通常情况下,同一折叠的超家族彼此相连,这表明整个折叠是从一个古老的原型进化而来的。引人注目的是,一些连接将具有不同折叠的超家族连接起来。它们是由 20 到 40 个残基组成的模块化肽片段产生的,这些肽片段在连接的折叠中以不同的结构环境共同出现。这些可能是祖先肽模块池的后代,这些模块在 RNA 世界中作为辅助因子进化而来,第一个折叠蛋白通过扩增和重组从这些模块中产生。我们的折叠星系在一张图像中总结了蛋白质超家族之间的大多数已知和许多尚未描述的同源关系,为蛋白质结构域的进化提供了新的见解。