Bandyopadhyay Sanghamitra, Mitra Ramkrishna
Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
IEEE Trans Nanobioscience. 2009 Jun;8(2):139-46. doi: 10.1109/TNB.2009.2019642. Epub 2009 Apr 10.
Researchers are compelled to use heuristic-based pairwise sequence alignment tools instead of Smith-Waterman (SW) algorithm due to space and time constraints, thereby losing significant amount of sensitivity. Parallelization is a possible solution, though, till date, the parallelization is restricted to database searching through database fragmentation. In this paper, the power of a cluster computer is utilized for developing a parallel algorithm, RPAlign, involving, first, the detection of regions that are potentially alignable, followed by their actual alignment. RPAlign is found to reduce the timing requirement by a factor of upto 9 and 99 when used with the basic local alignment search tool (BLAST) and SW, respectively, while keeping the sensitivity similar to the corresponding method. For distantly related sequences, which remain undetected by BLAST, RPAlign with SW can be used. Again, for megabase-scale sequences, when SW becomes computationally intractable, the proposed method can still align them reasonably fast with high sensitivity.
由于空间和时间限制,研究人员不得不使用基于启发式的成对序列比对工具,而不是史密斯-沃特曼(SW)算法,从而损失了大量的灵敏度。并行化是一种可能的解决方案,不过,到目前为止,并行化仅限于通过数据库碎片化进行数据库搜索。在本文中,利用集群计算机的能力开发了一种并行算法RPAlign,该算法首先检测可能可比对的区域,然后进行实际比对。结果发现,与基本局部比对搜索工具(BLAST)一起使用时,RPAlign将时间需求减少了多达9倍,与SW一起使用时减少了99倍,同时保持与相应方法相似的灵敏度。对于BLAST未检测到的远缘相关序列,可以使用与SW结合的RPAlign。同样,对于兆碱基规模的序列,当SW在计算上变得难以处理时,所提出的方法仍然可以以高灵敏度相当快速地对它们进行比对。