Deshpande A S, Richards D S, Pearson W R
Department of Computer Science, University of Virginia, Charlottesville 22908.
Comput Appl Biosci. 1991 Apr;7(2):237-47. doi: 10.1093/bioinformatics/7.2.237.
We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.
我们编写了两个在英特尔超立方体计算机上运行的用于搜索生物序列数据库的程序。PSCANLIB将单个序列与一个序列库进行比较,而PCOMPLIB则将一个序列库中的所有条目与另一个库进行比较。这些程序提供了一个用于相似性搜索的通用框架;它们包括用于读取查询序列、搜索参数和库条目的函数,以及报告搜索结果的函数。我们已经分离出了用于计算查询序列与库序列之间相似性得分的特定函数的代码;通过编辑两个文件可以实现替代搜索算法。我们已经在这个框架内实现了快速FASTA序列比较算法和更严格的Smith-Waterman算法。基于16节点iPSC/2 80386的超立方体上的PSCANLIB程序,使用FASTA算法,在大约16秒内可以将一个229个氨基酸的蛋白质序列与一个340万个残基的序列库进行比较。使用Smith-Waterman算法,相同的搜索需要35分钟。在第三代32节点英特尔iPSC/860超立方体上,PCOMPLIB程序使用FASTA算法可以在5.3分钟内将一个80万个氨基酸的蛋白质序列库与自身进行比较。