Fliess Amit, Motro Benny, Unger Ron
Faculty of Life Science, Bar-Ilan University, Ramat-Gan, Israel.
Proteins. 2002 Aug 1;48(2):377-87. doi: 10.1002/prot.10156.
An important question in protein evolution is to what extent proteins may have undergone swaps (switches of domain or fragment order) during evolution. Such events might have occurred in several forms: Swaps of short fragments, swaps of structural and functional motifs, or recombination of domains in multidomain proteins. This question is important for the theoretical understanding of the evolution of proteins, and has practical implications for using swaps as a design tool in protein engineering. In order to analyze the question systematically, we conducted a large scale survey of possible swaps and permutations among all pairs of protein from the Swissport database. A swap is defined as a specific kind of sequence mutation between two proteins in which two fragments that appear in both sequences have different relative order in the two sequences. For example, aXbYc and dYeXf are defined as a swap, where X and Y represent sequence fragments that switched their order. Identifying such swaps is difficult using standard sequence comparison packages. One of the main problems in the analysis stems from the fact that many sequences contain repeats, which may be identified as false-positive swaps. We have used two different approaches to detect pairs of proteins with swaps. The first approach is based on the predefined list of domains in Pfam. We identified all the proteins that share at least two domains and analyzed their relative order, looking for pairs in which the order of these domains was switched. We designed an algorithm to distinguish between real swaps and duplications. In the second approach, we used Blast to detect pairs of proteins that share several fragments. Then, we used an automatic procedure to select pairs that are likely to contain swaps. Those pairs were analyzed visually, using a graphical tool, to eliminate duplications. Combining these approaches, about 140 different cases of swaps in the Swissprot database were found (after eliminating multiple pairs within the same family). Some of the cases have been described in the literature, but many are novel examples. Although each new example identified may be interesting to analyze, our main conclusion is that cases of swaps are rare in protein evolution. This observation is at odds with the common view that proteins are very modular to the point that modules (e.g., domains) can be shuffled between proteins with minimal constraints. Our study suggests that sequential constraints, i.e., the relative order between domains, are highly conserved.
蛋白质进化中的一个重要问题是,在进化过程中蛋白质可能在多大程度上经历了交换(结构域或片段顺序的切换)。此类事件可能以多种形式发生:短片段的交换、结构和功能基序的交换,或多结构域蛋白中结构域的重组。这个问题对于从理论上理解蛋白质的进化很重要,并且对于在蛋白质工程中使用交换作为设计工具具有实际意义。为了系统地分析这个问题,我们对来自Swissport数据库的所有蛋白质对之间可能的交换和排列进行了大规模调查。交换被定义为两个蛋白质之间的一种特定序列突变,其中在两个序列中都出现的两个片段在这两个序列中有不同的相对顺序。例如,aXbYc和dYeXf被定义为一次交换,其中X和Y代表交换了顺序的序列片段。使用标准序列比较程序包很难识别此类交换。分析中的一个主要问题源于这样一个事实,即许多序列包含重复序列,这可能被识别为假阳性交换。我们使用了两种不同的方法来检测存在交换的蛋白质对。第一种方法基于Pfam中预定义的结构域列表。我们识别出所有共享至少两个结构域的蛋白质,并分析它们的相对顺序,寻找这些结构域顺序发生切换的蛋白质对。我们设计了一种算法来区分真正的交换和重复。在第二种方法中,我们使用Blast来检测共享多个片段的蛋白质对。然后,我们使用一个自动程序来选择可能包含交换的蛋白质对。使用图形工具对这些蛋白质对进行可视化分析,以消除重复。综合这些方法,在Swissprot数据库中发现了大约140种不同的交换情况(在消除同一家族内的多个蛋白质对之后)。其中一些情况已在文献中有所描述,但许多是新的例子。尽管每一个新发现的例子都可能值得分析,但我们的主要结论是,在蛋白质进化中交换的情况很少见。这一观察结果与普遍观点相悖,普遍观点认为蛋白质具有很强的模块化,以至于模块(例如结构域)可以在蛋白质之间以最小的限制进行重排。我们的研究表明,序列约束,即结构域之间的相对顺序,是高度保守的。