一种用于基因序列比较的并行计算方法：具有工作节点间通信的主从范式。

A parallel computing approach to genetic sequence comparison: the master-worker paradigm with interworker communication.

作者信息

Sittig D F, Foulser D, Carriero N, McCorkle G, Miller P L

机构信息

Department of Anesthesiology, Yale University, New Haven, Connecticut 06510.

出版信息

Comput Biomed Res. 1991 Apr;24(2):152-69. doi: 10.1016/0010-4809(91)90027-t.

DOI:10.1016/0010-4809(91)90027-t

PMID:2036781

Abstract

We have implemented a parallel version of a dynamic programming biological sequence comparison algorithm to study the potential applicability of using parallel computers for genetic sequence comparisons. Our parallel program is built using C-Linda, a machine-independent parallel programming language, and was tested on both a 10 CPU Sequent Symmetry and a 64 CPU Intel Hypercube. C-Linda implements a shared associative memory model, "tuple space," through which multiple processes can communicate and coordinate control. In our master-worker (MW) parallel implementation, a master process creates several worker processes, extracts a test sequence and multiple library sequences from a database and stores them in tuple space. Each worker reads the test sequence and then repeatedly extracts library strings from tuple space, performs pairwise sequence comparison using a local comparison algorithm to generate a similarity score, and returns the similarity scores to tuple space. The master collects the scores from tuple space and identifies the best match over all library sequences. We also implemented a method of global interworker communication to reduce the total search time by stopping those string comparisons that had no chance of improving on the current best match. Comparisons of the total run time, speedup, and efficiency were made for parallel and sequential versions of a basic MW implementation as well as versions with the global abort threshold.

摘要

我们实现了一种动态规划生物序列比较算法的并行版本，以研究使用并行计算机进行基因序列比较的潜在适用性。我们的并行程序是使用C-Linda构建的，C-Linda是一种与机器无关的并行编程语言，并且在具有10个CPU的Sequent Symmetry和具有64个CPU的Intel Hypercube上都进行了测试。C-Linda实现了一种共享关联内存模型“元组空间”，多个进程可以通过它进行通信和协调控制。在我们的主从（MW）并行实现中，主进程创建多个工作进程，从数据库中提取一个测试序列和多个库序列，并将它们存储在元组空间中。每个工作进程读取测试序列，然后反复从元组空间中提取库字符串，使用局部比较算法进行成对序列比较以生成相似性得分，并将相似性得分返回给元组空间。主进程从元组空间收集得分，并确定所有库序列中的最佳匹配。我们还实现了一种全局工作进程间通信方法，通过停止那些没有机会改进当前最佳匹配的字符串比较来减少总搜索时间。对基本MW实现的并行和顺序版本以及具有全局中止阈值的版本进行了总运行时间、加速比和效率的比较。