使用德布鲁因图对长读段中的错误进行准确的自我校正。

Accurate self-correction of errors in long reads using de Bruijn graphs.

作者信息

Salmela Leena, Walve Riku, Rivals Eric, Ukkonen Esko

机构信息

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France.

出版信息

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

DOI:10.1093/bioinformatics/btw321

PMID:27273673

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5351550/

Abstract

MOTIVATION

New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.

RESULTS

We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

AVAILABILITY AND IMPLEMENTATION

LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ .

CONTACT

leena.salmela@cs.helsinki.fi.

摘要

动机

新的长读长测序技术，如PacBio SMRT和牛津纳米孔技术，能够产生长达50000bp的测序读段，但错误率至少为15%。降低错误率对于后续将这些读段用于例如从头基因组组装等应用是必要的。错误校正问题要么通过将长读段相互比对来解决，要么通过一种混合方法来解决，该方法利用第二代测序技术产生的更准确的短读段来校正长读段。

结果

我们提出了一种仅使用长读段的错误校正方法。该方法包括两个阶段：首先，我们使用一种基于德布鲁因图的迭代无比对校正方法，其中k-mer的长度不断增加；其次，使用通过多重比对找到的长距离依赖性对校正后的读段进行进一步优化。根据我们的实验，对于高覆盖度的读段集，所提出的方法是仅依赖长读段的最准确方法。此外，当读段集的覆盖度至少为75×时，新方法的通量至少高20%。

可用性和实现

LoRMA可在http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/免费获取。

联系方式

leena.salmela@cs.helsinki.fi。

相似文献

Accurate self-correction of errors in long reads using de Bruijn graphs.使用德布鲁因图对长读段中的错误进行准确的自我校正。

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。

Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。

Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.

Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。

Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.

Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。

Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段：通过在 De Bruijn 图上进行映射来自我纠正短读段。

Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.

LoRDEC: accurate and efficient long read error correction.LoRDEC：准确高效的长读错误纠正。

Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

Correcting errors in short reads by multiple alignments.通过多次比对纠正短读中的错误。

Bioinformatics. 2011 Jun 1;27(11):1455-61. doi: 10.1093/bioinformatics/btr170. Epub 2011 Apr 5.

引用本文的文献

Benchmarking of bioinformatics tools for the hybrid assembly of human and non-human whole-genome sequencing data.用于人类和非人类全基因组测序数据混合组装的生物信息学工具的基准测试。

Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025.

A live attenuated NS1-deficient vaccine candidate for cattle-origin influenza A (H5N1) clade 2.3.4.4.b viruses.一种针对牛源甲型流感（H5N1）2.3.4.4.b分支病毒的减毒活NS1缺陷候选疫苗。

NPJ Vaccines. 2025 Jul 12;10(1):151. doi: 10.1038/s41541-025-01207-9.

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。

Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.

Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。

Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.

Hybrid-hybrid correction of errors in long reads with HERO.使用 HERO 对长读进行混合-混合纠错。

Genome Biol. 2023 Dec 1;24(1):275. doi: 10.1186/s13059-023-03112-7.

Application of third-generation sequencing in cancer research.第三代测序技术在癌症研究中的应用。

Med Rev (2021). 2021 Oct 21;1(2):150-171. doi: 10.1515/mr-2021-0013. eCollection 2021 Dec.

Applications of long-read sequencing to Mendelian genetics.长读测序在孟德尔遗传学中的应用。

Genome Med. 2023 Jun 14;15(1):42. doi: 10.1186/s13073-023-01194-3.

LCAT: an isoform-sensitive error correction for transcriptome sequencing long reads.LCAT：一种针对转录组测序长读段的异构体敏感错误校正方法

Front Genet. 2023 May 24;14:1166975. doi: 10.3389/fgene.2023.1166975. eCollection 2023.

Metagenomic Analysis of Anaerobic Microbial Communities Degrading Short-Chain Fatty Acids as Sole Carbon Sources.以短链脂肪酸为唯一碳源的厌氧微生物群落的宏基因组分析

Microorganisms. 2023 Feb 7;11(2):420. doi: 10.3390/microorganisms11020420.

AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data.AccuVIR：一种用于第三代测序数据的 ACCUrate 病毒基因组组装工具。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac827.

本文引用的文献

Jabba: hybrid error correction for long sequencing reads.贾巴：针对长测序读段的混合错误校正。

Algorithms Mol Biol. 2016 May 3;11:10. doi: 10.1186/s13015-016-0075-7. eCollection 2016.

Assessing the performance of the Oxford Nanopore Technologies MinION.评估牛津纳米孔技术公司的MinION测序仪的性能。

Biomol Detect Quantif. 2015 Mar;3:1-8. doi: 10.1016/j.bdq.2015.02.001.

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正

Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.

Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.利用单分子测序和局部敏感哈希组装大型基因组。

Nat Biotechnol. 2015 Jun;33(6):623-30. doi: 10.1038/nbt.3238. Epub 2015 May 25.

Genome assembly using Nanopore-guided long and error-free DNA reads.使用纳米孔引导的长且无错误的DNA reads进行基因组组装。

BMC Genomics. 2015 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z.

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.深入了解Illumina MiSeq平台进行扩增子测序时的偏差和测序错误。

Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.

One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.一条染色体，一个连续体：长读测序和组装的完整微生物基因组。

Curr Opin Microbiol. 2015 Feb;23:110-20. doi: 10.1016/j.mib.2014.11.014. Epub 2014 Dec 1.

LoRDEC: accurate and efficient long read error correction.LoRDEC：准确高效的长读错误纠正。

Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.Proovread：通过迭代短读共识实现大规模高精度 PacBio 校正。

Bioinformatics. 2014 Nov 1;30(21):3004-11. doi: 10.1093/bioinformatics/btu392. Epub 2014 Jul 10.

GATB: Genome Assembly & Analysis Tool Box.GATB：基因组组装与分析工具包。

Bioinformatics. 2014 Oct 15;30(20):2959-61. doi: 10.1093/bioinformatics/btu406. Epub 2014 Jul 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用德布鲁因图对长读段中的错误进行准确的自我校正。

Accurate self-correction of errors in long reads using de Bruijn graphs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

动机

结果

可用性和实现

联系方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献