使用 FastHASH 加速读映射。

Accelerating read mapping with FastHASH.

机构信息

Depts. of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-14-S1-S13. Epub 2013 Jan 21.

DOI:10.1186/1471-2164-14-S1-S13

PMID:23369189

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3549798/

Abstract

With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection.We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.

摘要

随着下一代测序 (NGS) 技术的引入，我们正面临着基因组序列数据量的指数级增长。下一代测序在医学和遗传学方面的所有应用的成功都严重依赖于能够快速准确地处理和分析大量序列数据的计算技术。不幸的是，当前的读映射算法在处理 NGS 产生的大量数据时遇到了困难。我们提出了一种新的算法 FastHASH，它极大地提高了基于种子和扩展的哈希表的读映射算法的性能，同时保持了这些方法的高灵敏度和全面性。FastHASH 是一种与所有种子和扩展类读映射算法兼容的通用算法。它引入了两种主要技术，即邻域过滤和廉价的 K-mer 选择。我们实现了 FastHASH 并将其合并到流行的读映射程序 mrFAST 的代码库中。根据编辑距离的截止值，我们观察到速度提高了 19 倍，同时仍然保持了 100%的灵敏度和高度的全面性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/780c/3549798/46b7fbd86693/1471-2164-14-S1-S13-1.jpg

相似文献

Accelerating read mapping with FastHASH.

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-14-S1-S13. Epub 2013 Jan 21.

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.

BMC Genomics. 2018 May 9;19(Suppl 2):89. doi: 10.1186/s12864-018-4460-0.

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.

Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.

Fast and efficient short read mapping based on a succinct hash index.

BMC Bioinformatics. 2018 Mar 9;19(1):92. doi: 10.1186/s12859-018-2094-5.

Accel-Align: a fast sequence mapper and aligner based on the seed-embed-extend method.

BMC Bioinformatics. 2021 May 20;22(1):257. doi: 10.1186/s12859-021-04162-z.

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18.

Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):840-52. doi: 10.1109/TCBB.2014.2326876.

HISEA: HIerarchical SEed Aligner for PacBio data.

BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data.

BMC Genomics. 2015 Feb 26;16(1):133. doi: 10.1186/s12864-015-1309-7.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

引用本文的文献

Taming large-scale genomic analyses via sparsified genomics.

Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.

invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.

A framework for high-throughput sequence alignment using real processing-in-memory systems.

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad155.

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad151.

BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.

NAR Genom Bioinform. 2023 Jan 20;5(1):lqad004. doi: 10.1093/nargab/lqad004. eCollection 2023 Mar.

Pattern matching for high precision detection of LINE-1s in human genomes.

BMC Bioinformatics. 2022 Sep 13;23(1):375. doi: 10.1186/s12859-022-04907-4.

From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.

Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.

Technology dictates algorithms: recent developments in read alignment.

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Accel-Align: a fast sequence mapper and aligner based on the seed-embed-extend method.

BMC Bioinformatics. 2021 May 20;22(1):257. doi: 10.1186/s12859-021-04162-z.

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.

BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.

本文引用的文献

The bonobo genome compared with the chimpanzee and human genomes.

Nature. 2012 Jun 28;486(7404):527-31. doi: 10.1038/nature11128.

Insights into hominid evolution from the gorilla genome sequence.

Nature. 2012 Mar 7;483(7388):169-75. doi: 10.1038/nature10842.

Hobbes: optimized gram-based methods for efficient read alignment.

Nucleic Acids Res. 2012 Mar;40(6):e41. doi: 10.1093/nar/gkr1246. Epub 2011 Dec 22.

Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee.

Genome Res. 2011 Oct;21(10):1640-9. doi: 10.1101/gr.124461.111. Epub 2011 Jun 17.

Sensitive and fast mapping of di-base encoded reads.

Bioinformatics. 2011 Jul 15;27(14):1915-21. doi: 10.1093/bioinformatics/btr303. Epub 2011 May 17.

Genome structural variation discovery and genotyping.

Nat Rev Genet. 2011 May;12(5):363-76. doi: 10.1038/nrg2958. Epub 2011 Mar 1.

Mapping copy number variation by population-scale genome sequencing.

Nature. 2011 Feb 3;470(7332):59-65. doi: 10.1038/nature09708.

Comparative and demographic analysis of orang-utan genomes.

Nature. 2011 Jan 27;469(7331):529-33. doi: 10.1038/nature09687.

Genetic history of an archaic hominin group from Denisova Cave in Siberia.

Nature. 2010 Dec 23;468(7327):1053-60. doi: 10.1038/nature09710.

A map of human genome variation from population-scale sequencing.

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 FastHASH 加速读映射。

Accelerating read mapping with FastHASH.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献