基于哈希的下一代 DNA 测序长读序列映射算法剖析。

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

机构信息

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18.

DOI:10.1093/bioinformatics/btq648

PMID:21088030

Abstract

MOTIVATION

Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts.

RESULTS

We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2.

AVAILABILITY

http://www.ece.northwestern.edu/~smi539/agile.html.

摘要

动机

最近，已经提出了许多将短读序列映射到参考基因组的程序。其中许多程序都针对短读序列映射进行了大量优化，因此对于较短的查询非常高效，但这使得它们对于长于 200bp 的读段效率低下或不适用。然而，许多测序仪已经生成了更长的读段，并且预计还会有更多的读段产生。对于长读序列映射，可用的选择有限；BLAT、SSAHA2、FANGS 和 BWA-SW 是其中较为流行的几种。然而，重测序和个性化医疗需要更快的软件来将这些长测序读段映射到参考基因组上，以识别 SNPs 或罕见转录本。

结果

我们提出了 AGILE（AliGnIng Long rEads），这是一种基于哈希表的高通量序列映射算法，适用于更长的 454 读段，它使用对角线多种子匹配标准、定制的 q-gram 过滤和动态增量搜索方法等启发式方法来优化映射过程的每一步。在我们的实验中，我们观察到 AGILE 比 BLAT 更准确，与 BWA-SW 和 SSAHA2 相当。对于实际的错误率（<5%）和读长（200-1000bp），AGILE 比 BLAT、SSAHA2 和 BWA-SW 快得多。即使对于其他情况，AGILE 与 BWA-SW 相当，并且比 BLAT 和 SSAHA2 快几倍。

可用性

http://www.ece.northwestern.edu/~smi539/agile.html。

相似文献

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.基于哈希的下一代 DNA 测序长读序列映射算法剖析。

Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18.

YOABS: yet other aligner of biological sequences--an efficient linearly scaling nucleotide aligner.YOABS：另一种生物序列比对工具——高效线性比例核苷酸比对工具。

Bioinformatics. 2012 Apr 15;28(8):1070-7. doi: 10.1093/bioinformatics/bts102. Epub 2012 Mar 7.

Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。

Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.

FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes.FR-HIT，一个快速招募宏基因组reads 到同源参考基因组的程序。

Bioinformatics. 2011 Jun 15;27(12):1704-5. doi: 10.1093/bioinformatics/btr252. Epub 2011 Apr 19.

A fast read alignment method based on seed-and-vote for next generation sequencing.一种基于种子与投票的用于下一代测序的快速读段比对方法。

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.

Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.利用基于现场可编程门阵列（FPGA）的系统加速下一代长读长映射

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):840-52. doi: 10.1109/TCBB.2014.2326876.

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征，对多种新一代测序比对器的读段比对进行评估。

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence.MaxSSmap：一种用于通过最大得分子序列将发散短读段映射到基因组的GPU程序。

BMC Genomics. 2014 Nov 15;15(1):969. doi: 10.1186/1471-2164-15-969.

Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。

BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.

ARYANA: Aligning Reads by Yet Another Approach.ARYANA：另一种方法进行读段对齐。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-15-S9-S12. Epub 2014 Sep 10.

引用本文的文献

Technology dictates algorithms: recent developments in read alignment.技术决定算法：读段比对的最新进展。

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Short Read Mapping: An Algorithmic Tour.短读映射：算法之旅。

Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):436-458. doi: 10.1109/JPROC.2015.2455551. Epub 2015 Sep 7.

HIA: a genome mapper using hybrid index-based sequence alignment.HIA：一种使用基于混合索引的序列比对的基因组映射器。

Algorithms Mol Biol. 2015 Dec 23;10:30. doi: 10.1186/s13015-015-0062-4. eCollection 2015.

Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer.全基因组测序作为评估医学遗传学和癌症中致病突变的一种手段。

Cell Mol Life Sci. 2015 Apr;72(8):1463-71. doi: 10.1007/s00018-014-1807-9. Epub 2014 Dec 30.

Experience of targeted Usher exome sequencing as a clinical test.作为一种临床检测手段的靶向性 Usher 外显子组测序的经验。

Mol Genet Genomic Med. 2014 Jan;2(1):30-43. doi: 10.1002/mgg3.25. Epub 2013 Jul 10.

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.Subread 比对工具：基于种子投票的快速、准确和可扩展的读段比对。

Nucleic Acids Res. 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214. Epub 2013 Apr 4.

YAHA: fast and flexible long-read alignment with optimal breakpoint detection.YAHA：快速灵活的长读比对，具有最佳断点检测功能。

Bioinformatics. 2012 Oct 1;28(19):2417-24. doi: 10.1093/bioinformatics/bts456. Epub 2012 Jul 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于哈希的下一代 DNA 测序长读序列映射算法剖析。

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献