Berger M P, Munson P J
Analytical Biostatistics Section, National Institutes of Health, Bethesda, MD 20892.
Comput Appl Biosci. 1991 Oct;7(4):479-84. doi: 10.1093/bioinformatics/7.4.479.
The rigorous alignment of multiple protein sequences becomes impractical even with a modest number of sequences, since computer memory and time requirements increase as the product of the lengths of the sequences. We have devised a strategy to approach such an optimal alignment, which modifies the intensive computer storage and time requirements of dynamic programming. Our algorithm randomly divides a group of unaligned sequences into two subgroups, between which an optimal alignment is then obtained by a Needleman-Wunsch style of algorithm. Our algorithm uses a matrix with dimensions corresponding to the lengths of the two aligned sequence subgroups. The pairwise alignment process is repeated using different random divisions of the whole group into two subgroups. Compared with the rigorous approach of solving the n-dimensional lattice by dynamic programming, our iterative algorithm results in alignments that match or are close to the optimal solution, on a limited set of test problems. We have implemented this algorithm in a computer program that runs on the IBM PC class of machines, together with a user-friendly environment for interactively selecting sequences or groups of sequences to be aligned either simultaneously or progressively.
即使序列数量不多,对多个蛋白质序列进行严格比对也变得不切实际,因为计算机内存和时间需求会随着序列长度的乘积而增加。我们设计了一种策略来实现这种最优比对,该策略改变了动态规划对计算机存储和时间的高强度需求。我们的算法将一组未比对的序列随机分成两个子组,然后通过Needleman-Wunsch算法在这两个子组之间获得最优比对。我们的算法使用一个维度与两个比对序列子组长度相对应的矩阵。对整个组进行不同的随机划分成两个子组,重复进行两两比对过程。与通过动态规划解决n维晶格的严格方法相比,在一组有限的测试问题上,我们的迭代算法得出的比对结果与最优解匹配或接近最优解。我们已在运行于IBM PC类机器上的计算机程序中实现了该算法,并提供了一个用户友好的环境,用于交互式选择要同时或逐步比对的序列或序列组。