Saier M H
Department of Biology, University of California at San Diego, La Jolla, USA.
Microb Comp Genomics. 1996;1(3):129-50. doi: 10.1089/mcg.1996.1.129.
With the advent of megabase genome sequencing, the need for computational analyses increases exponentially. Sequencing errors must be corrected, encoded proteins must be identified, functions must be assigned to these proteins, and distant phylogenetic relationships must be recognized in order to maximize the yield of information obtainable from genome sequencing projects. Both the computer and the human brain have their limitations, but using them in combination, the biologist can vastly extend his or her analytic capabilities. Computer techniques can be used to estimate protein structure, function, biogenesis, and evolution. In this review, the application of available computer programs to several protein families, particularly transport, receptor, and transcriptional regulatory protein families, illustrate our current capabilities and limitations. Although some multidomain protein families are evolutionarily homogeneous, others have mosaic origins. Evidence concerning the nature and frequency of occurrence of domain shuffling, splicing, fusion, deletion, and duplication during evolution of specific protein families is evaluated. It is shown that specific families of enzymes, receptors, transport proteins, and transcriptional regulatory proteins share a common evolutionary origin, frequently diverging in function because of domain splicing and ligation. Some large families arose gradually over evolutionary time, whereas others developed suddenly, due to bursts of intragenic or intergenic (or both) duplication events occurring over relatively short periods of time. It is argued that energy coupling to transport was a late occurrence, superimposed on preexisting mechanisms of solute facilitation. It is also shown that several transport protein families have evolved independently of each other, employing different routes, at different times in evolutionary history, to give topologically similar transmembrane protein complexes.
随着兆碱基基因组测序技术的出现,对计算分析的需求呈指数级增长。必须校正测序错误、识别编码蛋白、为这些蛋白赋予功能,并且识别远缘的系统发育关系,以便从基因组测序项目中获取的信息产量最大化。计算机和人脑都有其局限性,但将它们结合使用,生物学家可以极大地扩展其分析能力。计算机技术可用于估计蛋白质的结构、功能、生物发生和进化。在本综述中,将可用的计算机程序应用于几个蛋白质家族,特别是转运蛋白、受体和转录调节蛋白家族,说明了我们目前的能力和局限性。虽然一些多结构域蛋白质家族在进化上是同源的,但其他家族则有嵌合起源。评估了有关特定蛋白质家族进化过程中结构域改组、剪接、融合、缺失和重复的性质和发生频率的证据。结果表明,特定的酶家族、受体家族、转运蛋白家族和转录调节蛋白家族具有共同的进化起源,常常由于结构域的剪接和连接而在功能上发生分歧。一些大家族在进化过程中逐渐出现,而其他家族则是突然发展起来的,这是由于在相对较短的时间内发生了基因内或基因间(或两者)的重复事件爆发。有人认为,能量与转运的耦合是后来才出现的,叠加在先前存在的溶质促进机制之上。还表明,几个转运蛋白家族彼此独立进化,在进化历史的不同时间采用不同的途径,形成拓扑结构相似的跨膜蛋白复合物。