• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ELECTOR:长读长校正方法评估工具

ELECTOR: evaluator for long reads correction methods.

作者信息

Marchet Camille, Morisse Pierre, Lecompte Lolita, Lefebvre Arnaud, Lecroq Thierry, Peterlongo Pierre, Limasset Antoine

机构信息

Univ Rennes, CNRS, Inria, IRISA-UMR 6074, F-35000 Rennes, France.

Univ. Lille, CNRS, UMR 9189 - CRIStAL, 59655 Villeneuve-d'Ascq, France.

出版信息

NAR Genom Bioinform. 2019 Nov 14;2(1):lqz015. doi: 10.1093/nargab/lqz015. eCollection 2020 Mar.

DOI:10.1093/nargab/lqz015
PMID:33575566
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671326/
Abstract

The error rates of third-generation sequencing data have been capped >5%, mainly containing insertions and deletions. Thereby, an increasing number of diverse long reads correction methods have been proposed. The quality of the correction has huge impacts on downstream processes. Therefore, developing methods allowing to evaluate error correction tools with precise and reliable statistics is a crucial need. These evaluation methods rely on costly alignments to evaluate the quality of the corrected reads. Thus, key features must allow the fast comparison of different tools, and scale to the increasing length of the long reads. Our tool, ELECTOR, evaluates long reads correction and is directly compatible with a wide range of error correction tools. As it is based on multiple sequence alignment, we introduce a new algorithmic strategy for alignment segmentation, which enables us to scale to large instances using reasonable resources. To our knowledge, we provide the unique method that allows producing reproducible correction benchmarks on the latest ultra-long reads (>100 k bases). It is also faster than the current state-of-the-art on other datasets and provides a wider set of metrics to assess the read quality improvement after correction. ELECTOR is available on GitHub (https://github.com/kamimrcht/ELECTOR) and Bioconda.

摘要

第三代测序数据的错误率一直高于5%,主要包含插入和缺失。因此,越来越多不同的长读段校正方法被提出。校正的质量对下游流程有巨大影响。所以,开发能够用精确且可靠的统计数据评估错误校正工具的方法是一项迫切需求。这些评估方法依赖于代价高昂的比对来评估校正后读段的质量。因此,关键特性必须允许对不同工具进行快速比较,并能适应长读段长度不断增加的情况。我们的工具ELECTOR可评估长读段校正,并且直接与多种错误校正工具兼容。由于它基于多序列比对,我们引入了一种新的比对分割算法策略,这使我们能够使用合理资源扩展到大型实例。据我们所知,我们提供了唯一一种能够在最新的超长读段(>100 k碱基)上生成可重复校正基准的方法。它在其他数据集上也比当前的最先进方法更快,并提供了更广泛的指标来评估校正后读段质量的提升。ELECTOR可在GitHub(https://github.com/kamimrcht/ELECTOR)和Bioconda上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/bcb365ce7396/lqz015fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/e98b605e39ac/lqz015fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/af64810ac499/lqz015fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/6962a1b8f590/lqz015fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/3c1f5355bb9b/lqz015fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/bcb365ce7396/lqz015fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/e98b605e39ac/lqz015fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/af64810ac499/lqz015fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/6962a1b8f590/lqz015fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/3c1f5355bb9b/lqz015fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7c/7671326/bcb365ce7396/lqz015fig5.jpg

相似文献

1
ELECTOR: evaluator for long reads correction methods.ELECTOR:长读长校正方法评估工具
NAR Genom Bioinform. 2019 Nov 14;2(1):lqz015. doi: 10.1093/nargab/lqz015. eCollection 2020 Mar.
2
A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。
BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.
3
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.
4
Bi-level error correction for PacBio long reads.用于PacBio长读段的双水平错误校正
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-June;17(3):899-905. doi: 10.1109/TCBB.2017.2780832. Epub 2017 Dec 7.
5
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.
6
Jabba: hybrid error correction for long sequencing reads.贾巴:针对长测序读段的混合错误校正。
Algorithms Mol Biol. 2016 May 3;11:10. doi: 10.1186/s13015-016-0075-7. eCollection 2016.
7
FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。
Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.
8
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
9
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
10
A comparative evaluation of hybrid error correction methods for error-prone long reads.对易错长读进行混合纠错方法的比较评估。
Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.

引用本文的文献

1
Sequencing DNA with nanopores: Troubles and biases.用纳米孔测序 DNA:问题和偏差。
PLoS One. 2021 Oct 1;16(10):e0257521. doi: 10.1371/journal.pone.0257521. eCollection 2021.
2
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
3
Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly.Ratatosk:长读段的混合纠错可实现准确的变异调用和组装。

本文引用的文献

1
Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data.应用于纳米孔RNA测序数据的长读长纠错软件的比较评估
Brief Bioinform. 2020 Jul 15;21(4):1164-1181. doi: 10.1093/bib/bbz058.
2
A comparative evaluation of hybrid error correction methods for error-prone long reads.对易错长读进行混合纠错方法的比较评估。
Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.
3
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Genome Biol. 2021 Jan 8;22(1):28. doi: 10.1186/s13059-020-02244-4.
4
Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data.跨纳米模拟技术对纳米孔 RNA 测序数据进行了特征描述和模拟。
Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa061.
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
4
Versatile genome assembly evaluation with QUAST-LG.QUAST-LG 进行多功能基因组组装评估。
Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266.
5
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
6
Piercing the dark matter: bioinformatics of long-range sequencing and mapping.穿透暗物质:长程测序和图谱的生物信息学。
Nat Rev Genet. 2018 Jun;19(6):329-346. doi: 10.1038/s41576-018-0003-4.
7
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.
8
LRCstats, a tool for evaluating long reads correction methods.LRCstats,一种用于评估长读纠错方法的工具。
Bioinformatics. 2017 Nov 15;33(22):3652-3654. doi: 10.1093/bioinformatics/btx489.
9
MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads.MECAT:用于单分子测序读取的快速映射、错误纠正和从头组装。
Nat Methods. 2017 Nov;14(11):1072-1074. doi: 10.1038/nmeth.4432. Epub 2017 Sep 18.
10
HALC: High throughput algorithm for long read error correction.HALC:用于长读长纠错的高通量算法。
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.