Pekurovsky D, Shindyalov I N, Bourne P E
San Diego Supercomputer Center, University of California San Diego, La Jolla 92093, USA.
Bioinformatics. 2004 Aug 12;20(12):1940-7. doi: 10.1093/bioinformatics/bth184. Epub 2004 Mar 25.
Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures.
We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors.
Source code is available by contacting one of the authors.
使用各种并行处理器计算机架构分析大型生物数据集是生物信息学中的常见任务。通过妥善处理这些数据中存在的冗余,并利用这些计算架构的独特特性,可显著提高分析效率。
我们描述了这种分析的通用方法,但使用程序CEPAR展示了具体结果。CEPAR是组合扩展算法在大规模并行(PAR)模式下的高效实现,用于从蛋白质数据库中查找成对蛋白质结构相似性并比对蛋白质结构。文中描述了CEPAR的设计与实现,并给出了该算法在大量处理器上运行时的效率结果。
可通过联系作者之一获取源代码。