• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可扩展的长读自我纠错和多重序列比对的组装优化。

Scalable long read self-correction and assembly polishing with multiple sequence alignment.

机构信息

Univ Rennes, Inria, CNRS, IRISA, 35000, Rennes, France.

Univ. Lille, CNRS, UMR 9189 - CRIStAL, 59000, Lille, France.

出版信息

Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.

DOI:10.1038/s41598-020-80757-5
PMID:33436980
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7804095/
Abstract

Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .

摘要

第三代测序技术可以对长达数十千碱基对的长读段进行测序,有望解决各种问题。然而,它们的错误率较高,目前的错误率约为 10%。因此,在长读段分析项目中经常会使用自我纠错。我们介绍了 CONSENT,这是一种新的自我纠错方法,它同时依赖于多序列比对和局部 de Bruijn 图。为了确保可扩展性,多序列比对计算得益于一种新的高效分段策略,从而实现了大规模的加速。CONSENT 与最先进的方法相比表现良好,在真实的 Oxford Nanopore 数据上表现更好。具体来说,CONSENT 是唯一一种能够高效处理超长读段的方法,并且能够在 10 天内处理完整的人类数据集,其中包含长达 1.5 Mbp 的读段。此外,我们的实验表明,使用 CONSENT 进行错误纠正可以提高 Flye 组装的质量。此外,CONSENT 实现了一种抛光功能,允许纠正原始组装。我们的实验表明,CONSENT 比其他抛光工具快 2-38 倍,同时提供了可比的结果。此外,我们表明,在人类数据集上,组装原始数据并对组装进行抛光比纠正读取然后再组装消耗的资源更少,同时提供了更好的结果。CONSENT 可在 https://github.com/morispi/CONSENT 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cd8/7804095/3f783e7ca0ac/41598_2020_80757_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cd8/7804095/1798ea77eff1/41598_2020_80757_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cd8/7804095/3f783e7ca0ac/41598_2020_80757_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cd8/7804095/1798ea77eff1/41598_2020_80757_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cd8/7804095/3f783e7ca0ac/41598_2020_80757_Fig2_HTML.jpg

相似文献

1
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
2
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
3
Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨,以改进基因组分析。
Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.
4
Accurate self-correction of errors in long reads using de Bruijn graphs.使用德布鲁因图对长读段中的错误进行准确的自我校正。
Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.
5
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
6
A spectral algorithm for fast de novo layout of uncorrected long nanopore reads.一种用于快速从头设计未经校正的长纳米孔读段的谱算法。
Bioinformatics. 2017 Oct 15;33(20):3188-3194. doi: 10.1093/bioinformatics/btx370.
7
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.
8
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
9
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.牛津纳米孔测序、混合纠错及真核生物基因组的从头组装
Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.
10
BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer.BlockPolish:通过块划分与征服实现长读序列组装的精确抛光。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab405.

引用本文的文献

1
Benchmarking of bioinformatics tools for the hybrid assembly of human and non-human whole-genome sequencing data.用于人类和非人类全基因组测序数据混合组装的生物信息学工具的基准测试。
Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025.
2
CloseRead: a tool for assessing assembly errors in immunoglobulin loci applied to vertebrate long-read genome assemblies.CloseRead:一种用于评估免疫球蛋白基因座装配错误的工具,应用于脊椎动物长读长基因组装配。
Genome Biol. 2025 May 20;26(1):131. doi: 10.1186/s13059-025-03594-7.
3
Highly accurate assembly polishing with DeepPolisher.

本文引用的文献

1
ELECTOR: evaluator for long reads correction methods.ELECTOR:长读长校正方法评估工具
NAR Genom Bioinform. 2019 Nov 14;2(1):lqz015. doi: 10.1093/nargab/lqz015. eCollection 2020 Mar.
2
Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.
3
FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。
使用深度抛光机进行高精度装配抛光。
Genome Res. 2025 Jul 1;35(7):1595-1608. doi: 10.1101/gr.280149.124.
4
Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。
Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.
5
DeepCorr: a novel error correction method for 3GS long reads based on deep learning.DeepCorr:一种基于深度学习的针对3GS长读段的新型错误校正方法。
PeerJ Comput Sci. 2024 Jul 26;10:e2160. doi: 10.7717/peerj-cs.2160. eCollection 2024.
6
Skeletal Muscle mRNA Splicing Variants Association With Four Different Fitness and Energetic Measures in the GESTALT Study.在GESTALT研究中,骨骼肌mRNA剪接变体与四种不同的体能和能量指标的关联。
J Cachexia Sarcopenia Muscle. 2025 Feb;16(1):e13603. doi: 10.1002/jcsm.13603. Epub 2024 Dec 2.
7
A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study.作物泛基因组开发的分步指南:以紫花苜蓿(Medicago sativa)为例。
BMC Genomics. 2024 Oct 31;25(1):1022. doi: 10.1186/s12864-024-10931-w.
8
Assessing Assembly Errors in Immunoglobulin Loci: A Comprehensive Evaluation of Long-read Genome Assemblies Across Vertebrates.评估免疫球蛋白基因座中的组装错误:对脊椎动物全基因组长读长组装的综合评估
bioRxiv. 2024 Aug 2:2024.07.19.604360. doi: 10.1101/2024.07.19.604360.
9
Strategies and tools in illumina and nanopore-integrated metagenomic analysis of microbiome data.微生物组数据的Illumina和纳米孔整合宏基因组分析中的策略与工具
Imeta. 2023 Jan 9;2(1):e72. doi: 10.1002/imt2.72. eCollection 2023 Feb.
10
CAREx: context-aware read extension of paired-end sequencing data.CAREx:基于上下文感知的配对末端测序数据扩展。
BMC Bioinformatics. 2024 May 10;25(1):186. doi: 10.1186/s12859-024-05802-w.
Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.
4
Hercules: a profile HMM-based hybrid error correction algorithm for long reads.赫拉克勒斯:一种基于轮廓隐马尔可夫模型的长读混合纠错算法。
Nucleic Acids Res. 2018 Nov 30;46(21):e125. doi: 10.1093/nar/gky724.
5
HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.HECIL:一种具有迭代学习的长读长混合纠错算法。
Sci Rep. 2018 Jul 2;8(1):9936. doi: 10.1038/s41598-018-28364-3.
6
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
7
Versatile genome assembly evaluation with QUAST-LG.QUAST-LG 进行多功能基因组组装评估。
Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266.
8
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
9
Accurate detection of complex structural variations using single-molecule sequencing.利用单分子测序技术准确检测复杂结构变异。
Nat Methods. 2018 Jun;15(6):461-468. doi: 10.1038/s41592-018-0001-7. Epub 2018 Apr 30.
10
Piercing the dark matter: bioinformatics of long-range sequencing and mapping.穿透暗物质:长程测序和图谱的生物信息学。
Nat Rev Genet. 2018 Jun;19(6):329-346. doi: 10.1038/s41576-018-0003-4.