Elizabeth Cha I, Rouchka Eric C
University of Louisville Department of Computer Engineering and Computer Science, Louisville, KY 40292,
Proc IPDPS (Conf). 2005 Apr 4;19:8. doi: 10.1109/IPDPS.2005.145.
The computational power needed for searching exponentially growing databases, such as GenBank, has increased dramatically. Three different implementations of the most widely used sequence alignment tool, known as BLAST (Basic Local Alignment Search Tool), are studied for their efficiency on nucleotide-nucleotide comparisons. The performance of these implementations are evaluated using target databases and query sequences of varying lengths and number of entries constructed from human genomic and EST sequences. In general, WU BLAST was found to be most efficient when the database and query composition are unknown. NCBI BLAST appears to work best when the database contains a small number of sequences, while mpiBLAST shows the power of database distribution when the number of bases per target database is large. The optimal number of compute nodes in mpiBLAST varies depending upon the database, yet in the cases studied, remains surprisingly low.
搜索如GenBank这样呈指数增长的数据库所需的计算能力已大幅提高。针对最广泛使用的序列比对工具BLAST(基本局部比对搜索工具)的三种不同实现方式,研究了它们在核苷酸-核苷酸比较方面的效率。使用从人类基因组和EST序列构建的不同长度和条目的目标数据库及查询序列来评估这些实现方式的性能。总体而言,当数据库和查询组成未知时,发现WU BLAST效率最高。当数据库包含少量序列时,NCBI BLAST似乎效果最佳,而当每个目标数据库的碱基数量很大时,mpiBLAST则展现出数据库分布式计算的优势。mpiBLAST中计算节点的最佳数量因数据库而异,但在所研究的案例中,该数量仍低得出奇。