• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

检测长读重叠中的创新与挑战:对当前技术水平的评估

Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art.

作者信息

Chu Justin, Mohamadi Hamid, Warren René L, Yang Chen, Birol Inanç

机构信息

University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada.

出版信息

Bioinformatics. 2017 Apr 15;33(8):1261-1270. doi: 10.1093/bioinformatics/btw811.

DOI:10.1093/bioinformatics/btw811
PMID:28003261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5408847/
Abstract

UNLABELLED

Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. To supplement our review of the algorithms, we benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. In the versions of the tools tested, we observed that Minimap is the most computationally efficient, specific and sensitive method on the ONT datasets tested; whereas GraphMap and DALIGNER are the most specific and sensitive methods on the tested PB datasets. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput.

CONTACT

cjustin@bcgsc.ca , ibirol@bcgsc.ca.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未标注

识别易出错长读段之间的重叠区域,特别是来自牛津纳米孔技术公司(ONT)和太平洋生物科学公司(PB)的读段,对于某些下游应用至关重要,包括纠错和从头组装。虽然类似于读段与参考序列比对问题,但读段与读段重叠检测是一个独特的问题,可受益于在高错误率长读段上高效且稳健运行的专门算法。在此,我们综述了用于易出错长读段的当前最先进的读段与读段重叠工具,包括BLASR、DALIGNER、MHAP、GraphMap和Minimap。这些专门的生物信息学工具不仅在算法设计和方法上有所不同,而且在各种数据集上的性能稳健性、时间和内存效率以及可扩展性方面也存在差异。我们强调了这些工具的算法特征,以及在使用任何特定方法时它们可能存在的问题和偏差。为了补充我们对算法的综述,我们对这些工具进行了基准测试,跟踪它们的资源需求和计算性能,并评估了每个工具的特异性和精确性。在测试的工具版本中,我们观察到Minimap在测试的ONT数据集上是计算效率最高、最具特异性和敏感性的方法;而GraphMap和DALIGNER在测试的PB数据集上是最具特异性和敏感性的方法。随着测序通量的增加,可扩展性变得越来越重要,本文所探讨的概念可能适用于未来的测序技术。

联系方式

cjustin@bcgsc.ca,ibirol@bcgsc.ca。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/65786f766ba0/btw811f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/7ec6bd240572/btw811f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/5e8a60eebeae/btw811f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/e189dab5a312/btw811f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/64c637e3705c/btw811f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/65786f766ba0/btw811f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/7ec6bd240572/btw811f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/5e8a60eebeae/btw811f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/e189dab5a312/btw811f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/64c637e3705c/btw811f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ec/5408847/65786f766ba0/btw811f5.jpg

相似文献

1
Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art.检测长读重叠中的创新与挑战:对当前技术水平的评估
Bioinformatics. 2017 Apr 15;33(8):1261-1270. doi: 10.1093/bioinformatics/btw811.
2
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
3
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.
4
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
5
Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。
Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.
6
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.Minimap和miniasm:用于有噪声长序列的快速映射和从头组装。
Bioinformatics. 2016 Jul 15;32(14):2103-10. doi: 10.1093/bioinformatics/btw152. Epub 2016 Mar 19.
7
NanoSim: nanopore sequence read simulator based on statistical characterization.NanoSim:基于统计特征的纳米孔序列读取模拟器。
Gigascience. 2017 Apr 1;6(4):1-6. doi: 10.1093/gigascience/gix010.
8
Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.利用直系同源序列变异进行敏感比对可提高大片段重复区域的长读长序列比对和变异calling 效率。
Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.
9
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
10
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.

引用本文的文献

1
Application of third-generation sequencing in cancer research.第三代测序技术在癌症研究中的应用。
Med Rev (2021). 2021 Oct 21;1(2):150-171. doi: 10.1515/mr-2021-0013. eCollection 2021 Dec.
2
Cochlear Development; New Tools and Approaches.耳蜗发育;新工具与新方法
Front Cell Dev Biol. 2022 Jun 23;10:884240. doi: 10.3389/fcell.2022.884240. eCollection 2022.
3
Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes.比较长读测序技术在细菌和果蝇基因组分析中的应用。

本文引用的文献

1
DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads.DeepNano:用于MinION纳米孔测序读数碱基识别的深度循环神经网络
PLoS One. 2017 Jun 5;12(6):e0178751. doi: 10.1371/journal.pone.0178751. eCollection 2017.
2
NanoSim: nanopore sequence read simulator based on statistical characterization.NanoSim:基于统计特征的纳米孔序列读取模拟器。
Gigascience. 2017 Apr 1;6(4):1-6. doi: 10.1093/gigascience/gix010.
3
Nanocall: an open source basecaller for Oxford Nanopore sequencing data.Nanocall:一款用于牛津纳米孔测序数据的开源碱基识别器。
G3 (Bethesda). 2021 Jun 17;11(6). doi: 10.1093/g3journal/jkab083.
4
Benchmarking of long-read correction methods.长读长校正方法的基准测试。
NAR Genom Bioinform. 2020 May 25;2(2):lqaa037. doi: 10.1093/nargab/lqaa037. eCollection 2020 Jun.
5
INDEL detection, the 'Achilles heel' of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels.INDEL 检测是精确基因组编辑的“阿喀琉斯之踵”:基因编辑诱导 INDEL 精确分析方法综述。
Nucleic Acids Res. 2020 Dec 2;48(21):11958-11981. doi: 10.1093/nar/gkaa975.
6
Long-read human genome sequencing and its applications.长读长基因组测序及其应用。
Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.
7
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables.临床中的变异检测:基于生物学、临床和实验室变量做出明智的变异检测决策
Comput Struct Biotechnol J. 2019 Apr 8;17:561-569. doi: 10.1016/j.csbj.2019.04.002. eCollection 2019.
8
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
9
How Long Are Long Tandem Repeats? A Challenge for Current Methods of Whole-Genome Sequence Assembly: The Case of Satellites in .长串联重复序列有多长?对当前全基因组序列组装方法的挑战:以……中的卫星序列为例
Genes (Basel). 2018 Oct 16;9(10):500. doi: 10.3390/genes9100500.
10
De novo clustering of long reads by gene from transcriptomics data.基于转录组学数据的基因从头聚类长读长。
Nucleic Acids Res. 2019 Jan 10;47(1):e2. doi: 10.1093/nar/gky834.
Bioinformatics. 2017 Jan 1;33(1):49-55. doi: 10.1093/bioinformatics/btw569. Epub 2016 Sep 10.
4
Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads.用于纳米孔读数从头组装的混合与非混合方法评估
Bioinformatics. 2016 Sep 1;32(17):2582-9. doi: 10.1093/bioinformatics/btw237. Epub 2016 May 9.
5
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.Minimap和miniasm:用于有噪声长序列的快速映射和从头组装。
Bioinformatics. 2016 Jul 15;32(14):2103-10. doi: 10.1093/bioinformatics/btw152. Epub 2016 Mar 19.
6
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.使用GraphMap对纳米孔测序读数进行快速灵敏的映射
Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307.
7
Assessing the performance of the Oxford Nanopore Technologies MinION.评估牛津纳米孔技术公司的MinION测序仪的性能。
Biomol Detect Quantif. 2015 Mar;3:1-8. doi: 10.1016/j.bdq.2015.02.001.
8
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.牛津纳米孔测序、混合纠错及真核生物基因组的从头组装
Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.
9
Best Practices in Insect Genome Sequencing: What Works and What Doesn't.昆虫基因组测序的最佳实践:哪些方法可行,哪些不可行。
Curr Opin Insect Sci. 2015 Feb 1;7:1-7. doi: 10.1016/j.cois.2015.02.013.
10
A complete bacterial genome assembled de novo using only nanopore sequencing data.仅使用纳米孔测序数据从头组装完整的细菌基因组。
Nat Methods. 2015 Aug;12(8):733-5. doi: 10.1038/nmeth.3444. Epub 2015 Jun 15.