Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
Bioinformatics. 2012 Sep 15;28(18):2366-73. doi: 10.1093/bioinformatics/bts450. Epub 2012 Jul 18.
Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers.
We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data.
Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto
下一代测序分析在实验室和临床环境中都已成为一项重要任务。在大多数测序工作流程(如重测序)中,一个关键步骤是将基因组读取与参考基因组进行比对。对于研究人员来说,准确比对具有较大插入/缺失(indels)的读取是一项具有挑战性的计算任务。
我们引入了 SeqAlto 作为一种新的读取对齐算法。对于长度等于或大于 100 bp 的读取,SeqAlto 的速度比现有算法快 10 倍,同时保持了高精度和对齐具有较大(高达 50 bp)插入/缺失的读取的能力。这种效率的提高在未来测序数据分析中尤为重要,因为读取数量接近数十亿。此外,SeqAlto 在对齐人类基因组时使用的内存少于 8GB。我们使用真实数据和模拟数据对 SeqAlto 进行了基准测试,并与几个现有工具进行了比较。
可在 http://www.stanford.edu/group/wonglab/seqalto 上免费获取适用于学术用途的 Linux 和 Mac OS X 二进制文件。