• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在反向搜索方法上使用高级树探索的快速不精确映射。

Fast inexact mapping using advanced tree exploration on backward search methods.

作者信息

Salavert José, Tomás Andrés, Tárraga Joaquín, Medina Ignacio, Dopazo Joaquín, Blanquer Ignacio

机构信息

GRyCAP department of I3M, Universitat Politècnica de València, Building 8B, Camino de vera s/n, Valencia, 46022, Spain.

Bioinformatics department of Centro de Investigación Príncipe Felipe, Autopista del Saler 16, Valencia, 46012, Spain.

出版信息

BMC Bioinformatics. 2015 Jan 28;16:18. doi: 10.1186/s12859-014-0438-3.

DOI:10.1186/s12859-014-0438-3
PMID:25626517
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4384339/
Abstract

BACKGROUND

Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data.

RESULTS

The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20-40%.

CONCLUSIONS

Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

摘要

背景

用于下一代测序的短序列映射方法是由种子技术与基于动态规划方法的局部比对相结合组成。大多数种子算法基于反向搜索比对,使用布罗伊登-弗莱彻-戈德法布-香农算法(Burrows Wheeler Transform)、费拉吉纳和曼齐尼索引(Ferragina and Manzini Index)或后缀数组(Suffix Arrays)。所有这些反向搜索算法都具有出色的性能,但在允许错误时其计算成本会大幅增加。在本文中,我们讨论一种基于剪枝策略的不精确映射算法,用于对基因组数据进行搜索树探索。

结果

当允许6个碱基错误(包括插入、缺失和错配)时,所提出的算法比类似算法的速度提高了13倍。该算法可以处理高质量Illumina数据集中长度为400个碱基对且最多有9个错误的读段。在这个例子中,该算法作为预处理器,将需要比对的读段数量减少了55%。根据比对器的不同,总体执行时间减少了20% - 40%。

结论

尽管该算法并非旨在作为一个完整的序列映射工具,但可作为现代序列映射器的预处理步骤。这一步骤显著减少了需要比对的读段数量,加快了总体比对时间。此外,该算法可用于加速现有序列映射器的种子步骤。此外,还实现了一种核外索引,以便在没有昂贵内存配置的系统上处理大型基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/c9504c12c725/12859_2014_438_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/70341694efd9/12859_2014_438_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/4ce1b6e3be1b/12859_2014_438_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/ea00334a044c/12859_2014_438_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/c9504c12c725/12859_2014_438_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/70341694efd9/12859_2014_438_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/4ce1b6e3be1b/12859_2014_438_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/ea00334a044c/12859_2014_438_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c32/4384339/c9504c12c725/12859_2014_438_Fig4_HTML.jpg

相似文献

1
Fast inexact mapping using advanced tree exploration on backward search methods.在反向搜索方法上使用高级树探索的快速不精确映射。
BMC Bioinformatics. 2015 Jan 28;16:18. doi: 10.1186/s12859-014-0438-3.
2
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
3
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.CUSHAW:一种基于 Burrows-Wheeler 变换的适用于大型基因组的 CUDA 兼容短读序列比对程序。
Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9.
4
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
5
CLAST: CUDA implemented large-scale alignment search tool.CLAST:基于CUDA实现的大规模比对搜索工具。
BMC Bioinformatics. 2014 Dec 11;15(1):406. doi: 10.1186/s12859-014-0406-y.
6
BatMis: a fast algorithm for k-mismatch mapping.BatMis:一种快速的 k-错配映射算法。
Bioinformatics. 2012 Aug 15;28(16):2122-8. doi: 10.1093/bioinformatics/bts339. Epub 2012 Jun 10.
7
A Long Fragment Aligner called ALFALFA.一个名为ALFALFA的长片段比对工具。
BMC Bioinformatics. 2015 May 15;16(1):159. doi: 10.1186/s12859-015-0533-0.
8
Fast and memory efficient approach for mapping NGS reads to a reference genome.将二代测序(NGS) reads 映射到参考基因组的快速且内存高效的方法。
J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082.
9
A fast read alignment method based on seed-and-vote for next generation sequencing.一种基于种子与投票的用于下一代测序的快速读段比对方法。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.
10
pblat: a multithread blat algorithm speeding up aligning sequences to genomes.pblat:一种多线程 blat 算法,用于加速将序列与基因组对齐。
BMC Bioinformatics. 2019 Jan 15;20(1):28. doi: 10.1186/s12859-019-2597-8.

引用本文的文献

1
OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices.OCMA:快速、高效地分解超大关系矩阵。
G3 (Bethesda). 2019 Jan 9;9(1):13-19. doi: 10.1534/g3.118.200908.
2
A new parallel pipeline for DNA methylation analysis of long reads datasets.一种用于长读长数据集DNA甲基化分析的新型并行管道。
BMC Bioinformatics. 2017 Mar 9;18(1):161. doi: 10.1186/s12859-017-1574-3.

本文引用的文献

1
SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.SOAP3-dp:快速、准确、敏感的基于 GPU 的短读序列比对工具。
PLoS One. 2013 May 31;8(5):e65632. doi: 10.1371/journal.pone.0065632. Print 2013.
2
essaMEM: finding maximal exact matches using enhanced sparse suffix arrays.essaMEM:使用增强型稀疏后缀数组查找最大精确匹配。
Bioinformatics. 2013 Mar 15;29(6):802-4. doi: 10.1093/bioinformatics/btt042. Epub 2013 Jan 24.
3
The GEM mapper: fast, accurate and versatile alignment by filtration.GEM 映射器:通过过滤实现快速、准确和通用的比对。
Nat Methods. 2012 Dec;9(12):1185-8. doi: 10.1038/nmeth.2221. Epub 2012 Oct 28.
4
Long read alignment based on maximal exact match seeds.基于最大精确匹配种子的长读比对。
Bioinformatics. 2012 Sep 15;28(18):i318-i324. doi: 10.1093/bioinformatics/bts414.
5
Fast and accurate read alignment for resequencing.快速准确的重测序读对齐。
Bioinformatics. 2012 Sep 15;28(18):2366-73. doi: 10.1093/bioinformatics/bts450. Epub 2012 Jul 18.
6
Using GPUs for the exact alignment of short-read genetic sequences by means of the Burrows-Wheeler transform.利用图形处理单元(GPU)通过 Burrows-Wheeler 变换实现短读基因序列的精确比对。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1245-56. doi: 10.1109/TCBB.2012.49.
7
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
8
SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.SOAP3:基于 GPU 的超快速短读序列并行比对工具。
Bioinformatics. 2012 Mar 15;28(6):878-9. doi: 10.1093/bioinformatics/bts061. Epub 2012 Jan 28.
9
BarraCUDA - a fast short read sequence aligner using graphics processing units.BarraCUDA——一种使用图形处理单元的快速短读序列比对工具。
BMC Res Notes. 2012 Jan 13;5:27. doi: 10.1186/1756-0500-5-27.
10
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.