Wilbur W J, Lipman D J
Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.
随着蛋白质和核酸序列大型数据库的发展,对于有效搜索此类数据库以寻找与给定序列相似的序列的方法的需求变得明显。我们提出了一种基于固定k的序列元素k元组匹配的序列全局比较算法。与先前的相似性分析技术相比,该方法显著减少了搜索数据库所需的时间,同时灵敏度损失最小。该算法在另一个实现中也经过了调整,以生成严格的序列比对。目前,使用DEC KL - 10系统,我们可以在不到3分钟的时间内将国家生物医学研究基金会整个蛋白质数据库中的所有序列与一个350个残基的查询序列进行比较,并在不到2分钟的时间内将一个500个碱基的查询序列与洛斯阿拉莫斯核酸数据库中的所有真核序列进行类似分析。