Mathog David R
Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
Bioinformatics. 2003 Sep 22;19(14):1865-6. doi: 10.1093/bioinformatics/btg250.
BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output.
Source code is available from ftp://saf.bio.caltech.edu/
BLAST程序通常在大型对称多处理(SMP)机器上运行,在这种机器上多个线程可以同时工作,并且有足够的内存来在程序运行之间缓存数据库。本文描述了一组程序,它们能够在Beowulf配置中实现类似的性能,在该配置中没有单个节点有足够的内存来缓存数据库,但整个集群作为一个整体有足够的内存。为了实现这一结果,数据库被分割成大小相等的片段,并本地存储在每个节点上。每个查询在所有节点上并行运行,然后将所有节点生成的BLAST输出文件合并以产生最终输出。