Cameron Michael, Williams Hugh E, Cannane Adam
School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne, Australia.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jul-Sep;1(3):116-29. doi: 10.1109/TCBB.2004.32.
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step-semigapped alignment-compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic-restricted insertion alignment-that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.
同源性搜索是理解基因组序列的作用、结构和生化功能的关键工具。快速进行同源性搜索最常用的技术是BLAST,自20世纪90年代初以来,它就在大学、研究中心和商业企业中广泛使用。在本文中,我们在BLAST算法中提出了一个新步骤,以降低搜索的计算成本,同时对准确性的影响可以忽略不计。这个新步骤——半间隙比对——在无间隙比对的效率和有间隙比对的准确性之间进行了折衷,使BLAST能够以更低的计算成本准确地筛选序列。此外,我们还提出了一种启发式方法——受限插入比对——以避免不太可能的进化路径,目的是以对准确性影响可忽略不计的方式降低有间隙比对的成本。综合起来,在对局部比对递归进行优化之后,我们的两项技术使BLAST中有间隙比对阶段的速度提高了一倍多。我们得出结论,我们的技术是对BLAST算法的一项重要改进。比对算法的源代码可从http://www.bsg.rmit.edu.au/iga/下载。