用于高性能序列聚类的d2_cluster并行化方法评估。

Assessment of the parallelization approach of d2_cluster for high-performance sequence clustering.

作者信息

Carpenter John E, Christoffels Alan, Weinbach Yael, Hide Winston A

机构信息

SGI, 655E Lone Oak Drive, Eagan, Minnesota 55121, USA.

出版信息

J Comput Chem. 2002 May;23(7):755-7. doi: 10.1002/jcc.10025.

DOI:10.1002/jcc.10025

PMID:11948594

Abstract

The exponential increase in expressed sequence tag (EST) sequence data amplifies the computational cost of clustering sequences such that new algorithms are required to analyze data at a greater rate. We have parallelized d2_cluster on a SGI Origin 2000 multiprocessor and observed a speedup of approximately 100x on 126 processors when processing a 15,876 EST dataset. The parallelized d2_cluster code is obtainable from the SANBI website (http://www.sanbi.ac.za/CODES).

摘要

表达序列标签（EST）序列数据呈指数增长，这增加了序列聚类的计算成本，因此需要新算法以更高的速率分析数据。我们已在SGI Origin 2000多处理器上对d2_cluster进行并行化处理，在处理一个包含15,876条EST的数据集时，在126个处理器上实现了约100倍的加速。并行化的d2_cluster代码可从南非国家生物信息学研究所网站（http://www.sanbi.ac.za/CODES）获取。