Al-Neama Mohammed W, Reda Naglaa M, Ghaleb Fayed F M
Department of Mathematics, Faculty of Science, Al-Azhar University, Cairo, Egypt ; Education College for Girls, Mosul University, Mosul, Iraq.
Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt.
Biomed Res Int. 2014;2014:406178. doi: 10.1155/2014/406178. Epub 2014 Jun 12.
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI.
距离矩阵在不同的研究领域有多种用途。其计算通常是大多数生物信息学应用中的一项基本任务,尤其是在多序列比对中。生物序列数据库的巨大增长导致迫切需要加速这些计算。Al-Neama等人(即将发表)的论文中引入了DistVect算法,以提出一种将距离矩阵计算向量化的最新方法。它在顺序计算和并行计算中都表现出高效的性能。然而,现有的多核集群系统及其可扩展性和性能/成本比,满足了对更强大、更高效性能的需求。本文提出DistVect1作为一种针对多核集群的高效并行向量化算法,用于计算距离矩阵。它根据集群原语重新制定了DistVect1向量化算法。它推导出一种有效的分区和调度计算方法,适用于这种类型的架构。实现采用了MPI和OpenMP库的潜力。实验结果表明,与SSE2相比,该方法的性能提高了约3倍。此外,与ClustalW-MPI中使用的公开可用并行实现相比,它还实现了超过9个数量级的加速。