Rognes T
Department of Molecular Biology, Institute of Medical Microbiology, University of Oslo, The National Hospital, NO-0027 Oslo, Norway.
Nucleic Acids Res. 2001 Apr 1;29(7):1647-52. doi: 10.1093/nar/29.7.1647.
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
鉴于可获取的基因组序列数据量迅速增加,需要更快且更灵敏的算法用于序列相似性搜索。单指令多数据(SIMD)技术形式的并行处理能力如今在普通微处理器中已具备,能使单个微处理器并行执行许多操作。ParAlign算法就是专门为利用这项技术而设计的。新算法首先利用并行性对比对矩阵中所有对角线的精确最优无间隙比对分数进行非常快速的计算。然后,采用一种新颖的启发式方法,通过组合几条对角线的分数来计算有间隙比对的近似分数。这个近似分数用于选择后续史密斯 - 沃特曼比对中最有趣的数据库序列,该比对也进行了并行化处理。与现有的启发式方法相比,所得方法有显著改进。当使用相同方法计算匹配的统计显著性时,发现ParAlign的灵敏度和特异性与史密斯 - 沃特曼算法的实现效果相当。在速度方面,仅发现灵敏度明显较低的NCBI BLAST 2程序比新方法表现更优。可在http://dna.uio.no/search/进行在线搜索。