Uliel S, Fliess A, Amir A, Unger R
Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel.
Bioinformatics. 1999 Nov;15(11):930-6. doi: 10.1093/bioinformatics/15.11.930.
Circular permutation of a protein is a genetic operation in which part of the C-terminal of the protein is moved to its N-terminal. Recently, it has been shown that proteins that undergo engineered circular permutations generally maintain their three dimensional structure and biological function. This observation raises the possibility that circular permutation has occurred in Nature during evolution. In this scenario a protein underwent circular permutation into another protein, thereafter both proteins further diverged by standard genetic operations. To study this possibility one needs an efficient algorithm that for a given pair of proteins can detect the underlying event of circular permutations. A possible formal description of the question is: given two sequences, find a circular permutation of one of them under which the edit distance between the proteins is minimal. A naive algorithm might take time proportional to N3 or even N4, which is prohibitively slow for a large-scale survey. A sophisticated algorithm that runs in asymptotic time of N2 was recently suggested, but it is not practical for a large-scale survey.
A simple and efficient algorithm that runs in time N2 is presented. The algorithm is based on duplicating one of the two sequences, and then performing a modified version of the standard dynamic programming algorithm. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well.
A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors.
蛋白质的环形排列是一种基因操作,其中蛋白质的部分C末端被移动到其N末端。最近的研究表明,经过工程化环形排列的蛋白质通常能保持其三维结构和生物学功能。这一观察结果增加了环形排列在自然进化过程中发生的可能性。在这种情况下,一种蛋白质经过环形排列变成了另一种蛋白质,此后这两种蛋白质通过标准基因操作进一步分化。为了研究这种可能性,需要一种高效的算法,对于给定的一对蛋白质,该算法能够检测到潜在的环形排列事件。这个问题的一种可能的形式化描述是:给定两个序列,找到其中一个序列的环形排列,使得这两种蛋白质之间的编辑距离最小。一个简单的算法可能需要与N3甚至N4成比例的时间,对于大规模的研究来说,这太慢了。最近有人提出了一种在渐近时间N2内运行的复杂算法,但它对于大规模研究并不实用。
提出了一种在时间N2内运行的简单高效算法。该算法基于复制两个序列中的一个,然后执行标准动态规划算法的修改版本。虽然该算法不能保证找到最优结果,但我们提供的数据表明,在实际应用中该算法表现良好。
可根据作者要求提供一个Fortran程序,该程序可计算环形排列下的最优编辑距离。