Dayhoff M O
Fed Proc. 1976 Aug;35(10):2132-8.
The organization of proteins into superfamilies based primarily on their sequences is introduced: examples are given of the methods used to cluster the related sequences and to elucidate the evolutionary history of the corresponding genes within each superfamily. Within the framework of this organization, the amount of sequence information currently and potentially available in all living forms can be discussed. The 116 superfamilies already sampled reflect possibly 10% of the total number. There are related proteins from many species in all of these superfamilies, suggesting that the origin of a new superfamily is rare indeed. The proteins so far sequenced are so rigorously conserved by the evolutionary process that we would expect to recognize as related descendants of any protein found in the ancestral vertebrate. The evolutionary history of the thyrotropin-gonadotropin beta chain superfamily is discussed in detail as an example. Some proteins are so constrained in structure that related forms can be recognized in prokaryotes and eukaryotes. Evolution in these superfamilies can be traced back close to the origin of life itself. From the evolutionary tree of the c-type cytochromes the identity of the prokaryote types involved in the symbiotic origin of mitochondria and chloroplasts begins to emerge.
给出了用于聚类相关序列以及阐明每个超家族内相应基因进化历史的方法示例。在这种组织框架内,可以讨论目前所有生命形式中以及潜在可获得的序列信息的数量。已经抽样的116个超家族可能仅占总数的10%。所有这些超家族中都有来自许多物种的相关蛋白质,这表明新超家族的起源确实罕见。迄今为止测序的蛋白质在进化过程中受到如此严格的保守,以至于我们有望识别出作为任何在祖先脊椎动物中发现的蛋白质的相关后代。作为示例详细讨论了促甲状腺素 - 促性腺激素β链超家族的进化历史。一些蛋白质在结构上受到如此严格的限制,以至于在原核生物和真核生物中都能识别出相关形式。这些超家族的进化可以追溯到生命本身的起源。从c型细胞色素的进化树中,参与线粒体和叶绿体共生起源的原核生物类型的身份开始显现出来。