LoRDEC：准确高效的长读错误纠正。

LoRDEC: accurate and efficient long read error correction.

机构信息

Department of Computer Science and Helsinki Institute for Information Technology HIIT, FI-00014 University of Helsinki, Finland and LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, 34095 Montpellier Cedex 5, France.

出版信息

Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.

DOI:10.1093/bioinformatics/btu538

PMID:25165095

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4253826/

Abstract

MOTIVATION

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.

RESULTS

We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec.

摘要

动机

PacBio 单分子实时测序是一种第三代测序技术，可产生长读长，但通量相对较低，错误率较高。错误包括大量的插入和缺失，这使得后续的分析（如映射或从头组装）变得复杂。已经提出了一种利用第二代短读长的高准确性的混合策略来纠正长读长。短读长在长读长上的映射提供了足够的覆盖度，可以消除高达 99%的错误，但代价是运行时间极长，以及需要大量磁盘和内存空间。

结果

我们提出了 LoRDEC，这是一种混合纠错方法，它构建了一个简洁的 de Bruijn 图来表示短读长，并通过遍历图中的选定路径，为长读长中的每个错误区域寻找纠正序列。相比之下，LoRDEC 的速度至少快六倍，所需的内存或磁盘空间至少少 93%，而达到的准确性相当。

可用性和实现

LoRDEC 是用 C++编写的，在 Linux 平台上进行了测试，并可在 http://atgc.lirmm.fr/lordec 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/4253826/8e36ba203a73/btu538f1p.jpg

相似文献

LoRDEC: accurate and efficient long read error correction.

Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.

Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

PLoS One. 2015 Dec 7;10(12):e0144305. doi: 10.1371/journal.pone.0144305. eCollection 2015.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

Accurate self-correction of errors in long reads using de Bruijn graphs.

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

Assembly of long error-prone reads using de Bruijn graphs.

Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.

ARAMIS: From systematic errors of NGS long reads to accurate assemblies.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.

Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.

Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

LRCstats, a tool for evaluating long reads correction methods.

Bioinformatics. 2017 Nov 15;33(22):3652-3654. doi: 10.1093/bioinformatics/btx489.

QuorUM: An Error Corrector for Illumina Reads.

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

引用本文的文献

Integrative Analysis of Iso-Seq and RNA-Seq Identifies Key Genes Related to Fatty Acid Biosynthesis and High-Altitude Stress Adaptation in .

Genes (Basel). 2025 Jul 30;16(8):919. doi: 10.3390/genes16080919.

Assembly and Analysis of the Mitochondrial Genome of subsp. , an Important Ecological and Economic Forest Tree Species in China.

Plants (Basel). 2025 Jul 14;14(14):2170. doi: 10.3390/plants14142170.

Integrated morphological observation, metabolomics, and transcriptomics to investigate the effect of growth years on the quality of Atractylodes macrocephala Koidz.

BMC Plant Biol. 2025 Jul 14;25(1):912. doi: 10.1186/s12870-025-06958-0.

Quantification of single cell-type-specific alternative transcript initiation.

bioRxiv. 2025 May 4:2025.04.29.651292. doi: 10.1101/2025.04.29.651292.

Androgen induces 3'UTR shortening of de novo lipogenesis genes by alternative polyadenylation in prostate cancer cells.

Sci China Life Sci. 2025 Jul 8. doi: 10.1007/s11427-024-2740-7.

Chromosome-level genome of Zoysia sinica in the intertidal zone reveals genomic insights into waterlogging stress adaptation.

Plant Genome. 2025 Sep;18(3):e70070. doi: 10.1002/tpg2.70070.

Complete genome sequence of strain ISA501.

Microbiol Resour Announc. 2025 Jul 10;14(7):e0010625. doi: 10.1128/mra.00106-25. Epub 2025 Jun 20.

Telomere-to-telomere genome assembly of strain ISA502 isolated from maize rhizosphere.

Microbiol Resour Announc. 2025 Jul 10;14(7):e0012225. doi: 10.1128/mra.00122-25. Epub 2025 Jun 18.

Skeleton-Forming Responses of Reef-Building Corals under Ocean Acidification.

Research (Wash D C). 2025 Jun 11;8:0736. doi: 10.34133/research.0736. eCollection 2025.

Characterization of the Complete Mitochondrial Genome of and Its Phylogenetic Status in Viviparidae.

Animals (Basel). 2025 Apr 30;15(9):1284. doi: 10.3390/ani15091284.

本文引用的文献

Using cascading Bloom filters to improve the memory usage for de Brujin graphs.

Algorithms Mol Biol. 2014 Feb 24;9(1):2. doi: 10.1186/1748-7188-9-2.

Space-efficient and exact de Bruijn graph representation based on a Bloom filter.

Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22.

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.

CRAC: an integrated approach to the analysis of RNA-seq reads.

Genome Biol. 2013 Mar 28;14(3):R30. doi: 10.1186/gb-2013-14-3-r30.

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.

PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21.

Improving PacBio long read accuracy by short read alignment.

PLoS One. 2012;7(10):e46679. doi: 10.1371/journal.pone.0046679. Epub 2012 Oct 4.

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.

BMC Bioinformatics. 2012 Sep 19;13:238. doi: 10.1186/1471-2105-13-238.

Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Nat Biotechnol. 2012 Jul 1;30(7):693-700. doi: 10.1038/nbt.2280.

A hybrid approach for the automated finishing of bacterial genomes.

Nat Biotechnol. 2012 Jul 1;30(7):701-707. doi: 10.1038/nbt.2288.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LoRDEC：准确高效的长读错误纠正。

LoRDEC: accurate and efficient long read error correction.

机构信息