Suppr超能文献

基于哈希的下一代 DNA 测序长读序列映射算法剖析。

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

机构信息

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18.

Abstract

MOTIVATION

Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts.

RESULTS

We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2.

AVAILABILITY

http://www.ece.northwestern.edu/~smi539/agile.html.

摘要

动机

最近,已经提出了许多将短读序列映射到参考基因组的程序。其中许多程序都针对短读序列映射进行了大量优化,因此对于较短的查询非常高效,但这使得它们对于长于 200bp 的读段效率低下或不适用。然而,许多测序仪已经生成了更长的读段,并且预计还会有更多的读段产生。对于长读序列映射,可用的选择有限;BLAT、SSAHA2、FANGS 和 BWA-SW 是其中较为流行的几种。然而,重测序和个性化医疗需要更快的软件来将这些长测序读段映射到参考基因组上,以识别 SNPs 或罕见转录本。

结果

我们提出了 AGILE(AliGnIng Long rEads),这是一种基于哈希表的高通量序列映射算法,适用于更长的 454 读段,它使用对角线多种子匹配标准、定制的 q-gram 过滤和动态增量搜索方法等启发式方法来优化映射过程的每一步。在我们的实验中,我们观察到 AGILE 比 BLAT 更准确,与 BWA-SW 和 SSAHA2 相当。对于实际的错误率(<5%)和读长(200-1000bp),AGILE 比 BLAT、SSAHA2 和 BWA-SW 快得多。即使对于其他情况,AGILE 与 BWA-SW 相当,并且比 BLAT 和 SSAHA2 快几倍。

可用性

http://www.ece.northwestern.edu/~smi539/agile.html。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验