Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
Many programs for aligning short sequencing reads to a reference genome have been developed in the last 2 years. Most of them are very efficient for short reads but inefficient or not applicable for reads >200 bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing-based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time.
We designed and implemented a new algorithm, Burrows-Wheeler Aligner's Smith-Waterman Alignment (BWA-SW), to align long sequences up to 1 Mb against a large sequence database (e.g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both.
在过去的两年中,已经开发出了许多用于将短测序reads 与参考基因组进行比对的程序。它们中的大多数对于短reads 非常有效,但对于 >200bp 的reads 则效率低下或不适用,因为这些算法是针对具有低测序错误率的短查询进行了大量且专门的调整。然而,一些测序平台已经产生了更长的reads,其他平台也即将面世。对于更长的reads,基于哈希的软件,如 BLAT 和 SSAHA2,仍然是唯一的选择。尽管如此,与短读对齐器相比,这些方法在单位时间内对齐的碱基数量上要慢得多。
我们设计并实现了一种新算法,Burrows-Wheeler Aligner 的 Smith-Waterman 对齐(BWA-SW),用于在几 GB 的内存中,将长达 1 Mb 的长序列与大型序列数据库(例如人类基因组)进行比对。该算法与 SSAHA2 一样准确,比 BLAT 更准确,并且比两者都快几倍到几十倍。