Gotoh O
Department of Biochemistry, Saitama Cancer Center Research Institute, Japan.
Comput Appl Biosci. 1987 Mar;3(1):17-20. doi: 10.1093/bioinformatics/3.1.17.
Existing methods for getting the locally best matched alignments between a pair of biological sequences require O(N2) computational steps and O(N2) storage, where N is the average sequence length. An improved method is presented with which the storage requirement is greatly reduced, while the computational steps remain O(N2). Only a small number of additional steps are required to display any common sub-sequences with similarity scores greater than a given threshold. The aligments found by the algorithm are optimal in the sense that their scores are locally maximal, where each score is a sum of weights given to individual matches/replacements, insertions and deletions involved in the alignment. The algorithm was implemented in C programming language on a personal computer. Data area of 64 kbytes on random access memory and a few hundred kbytes on a disk is sufficient for comparing two protein or nucleic acid sequences of 2500 residues. The programs are particularly valuable when used in combination with fast sequence search programs.
现有用于获取一对生物序列之间局部最佳匹配比对的方法需要O(N2)的计算步骤和O(N2)的存储空间,其中N是平均序列长度。本文提出了一种改进方法,该方法在计算步骤仍为O(N2)的情况下,大大降低了存储需求。只需少量额外步骤就能显示任何相似度得分大于给定阈值的公共子序列。该算法找到的比对在其得分局部最大的意义上是最优的,其中每个得分是赋予比对中各个匹配/替换、插入和删除的权重之和。该算法用C编程语言在个人计算机上实现。对于比较两个长度为2500个残基的蛋白质或核酸序列,随机存取存储器中64千字节的数据区域和磁盘上几百千字节的空间就足够了。当与快速序列搜索程序结合使用时,这些程序特别有价值。