Hung Che-Lun, Lin Yu-Shiang, Lin Chun-Yuan, Chung Yeh-Ching, Chung Yi-Fang
Department of Computer Science and Communication Engineering, Providence University, 200, Sec. 7, Taiwan Boulevard, Shalu Dist., Taichung City 43301, Taiwan.
Department of Computer Science, National Tsing Hua University, 101, Sec. 2, Kuang-Fu Road, Hsinchu City 30013, Taiwan.
Comput Biol Chem. 2015 Oct;58:62-8. doi: 10.1016/j.compbiolchem.2015.05.004. Epub 2015 May 21.
For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.
对于生物学应用而言,序列比对是分析DNA和蛋白质序列的一项重要策略。多序列比对是研究生物学数据(如同源建模、系统发育重建等)的一项基本方法。然而,多序列比对是一个NP难问题。在过去几十年里,已提出渐进式方法,通过采用迭代的两两比对来成功比对多个序列。由于下一代测序技术的快速发展,短时间内就能产生大量序列。当问题实例规模较大时,渐进式比对会很耗时。并行计算是此类应用的合适解决方案,而GPU是当代并行计算研究的重要架构之一。因此,在这项工作中,我们提出了ClustalW v2.0.11的GPU版本,即CUDA ClustalW v1.0。从实验结果可以看出,与ClustalW v2.0.11相比,CUDA ClustalW v1.0在整体执行时间上能实现超过33倍的加速。