Mott R
SmithKline-Beecham Pharmaceuticals R & D, New Frontiers Science Park (North),3rd Avenue, Harlow CM19 5AW, UK.
Bioinformatics. 1999 Jun;15(6):455-62. doi: 10.1093/bioinformatics/15.6.455.
Sequence alignments obtained using affine gap penalties are not always biologically correct, because the insertion of long gaps is over-penalised. There is a need for an efficient algorithm which can find local alignments using non-linear gap penalties.
A dynamic programming algorithm is described which computes optimal local sequence alignments for arbitrary, monotonically increasing gap penalties, i.e. where the cost g(k) of inserting a gap of k symbols is such that g(k) >/= g(k-1). The running time of the algorithm is dependent on the scoring scheme; if the expected score of an alignment between random, unrelated sequences of lengths m, n is proportional to log mn, then with one exception, the algorithm has expected running time O(mn). Elsewhere, the running time is no greater than O(mn(m+n)). Optimisations are described which appear to reduce the worst-case run-time to O(mn) in many cases. We show how using a non-affine gap penalty can dramatically increase the probability of detecting a similarity containing a long gap.
The source code is available to academic collaborators under licence.
使用仿射间隙罚分获得的序列比对并不总是符合生物学正确性,因为长间隙的插入受到过度罚分。需要一种高效算法,能够使用非线性间隙罚分来找到局部比对。
描述了一种动态规划算法,该算法可针对任意单调递增的间隙罚分计算最优局部序列比对,即插入k个符号的间隙的代价g(k)满足g(k) >= g(k - 1)。算法的运行时间取决于评分方案;如果长度为m、n的随机不相关序列之间比对的预期分数与log mn成正比,那么除了一种情况外,该算法的预期运行时间为O(mn)。在其他情况下,运行时间不超过O(mn(m + n))。描述了一些优化方法,这些方法在许多情况下似乎能将最坏情况运行时间降低到O(mn)。我们展示了使用非仿射间隙罚分如何能显著提高检测到包含长间隙的相似性的概率。
源代码在许可下可供学术合作者使用。