对易错长读进行混合纠错方法的比较评估。

A comparative evaluation of hybrid error correction methods for error-prone long reads.

机构信息

Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA.

Department of Biostatistics, University of Iowa, Iowa City, IA, 52242, USA.

出版信息

Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.

DOI:10.1186/s13059-018-1605-z

PMID:30717772

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6362602/

Abstract

BACKGROUND

Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods.

RESULTS

Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences.

CONCLUSIONS

Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.

摘要

背景

第三代测序技术通过生成比第二代测序技术长得多的读段，推动了生物研究的进展。然而，其臭名昭著的高错误率阻碍了直接的数据分析，限制了其应用。迄今为止，已经开发了一些针对这些易错长读段的纠错方法。输出数据质量对下游分析非常重要，而计算资源可能会限制一些计算密集型工具的应用。目前缺乏针对这些长读段纠错方法的标准化评估。

结果

在这里，我们对十种最先进的长读段纠错方法进行了性能评估。我们为性能评估建立了一套通用的基准，包括灵敏度、准确性、输出率、比对率、输出读长、运行时间和内存使用，以及纠错对长读的两个下游应用（从头组装和解决单倍型序列）的影响。

结论

考虑到所有这些指标，我们根据可用数据量、计算资源和个人研究目标，提供了一种基于方法选择的建议性指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba78/6362602/eef577f2185e/13059_2018_1605_Fig1_HTML.jpg

相似文献

A comparative evaluation of hybrid error correction methods for error-prone long reads.对易错长读进行混合纠错方法的比较评估。

Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.

A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。

BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.

Integration of hybrid and self-correction method improves the quality of long-read sequencing data.混合和自校正方法的整合提高了长读测序数据的质量。

Brief Funct Genomics. 2024 May 15;23(3):249-255. doi: 10.1093/bfgp/elad026.

Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.基于序列的第三代测序读段质量评估新方法。

Genes (Basel). 2019 Jan 14;10(1):44. doi: 10.3390/genes10010044.

Accurate self-correction of errors in long reads using de Bruijn graphs.使用德布鲁因图对长读段中的错误进行准确的自我校正。

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna：用于配置短读和长读基因组测序错误纠正工具的变压器架构。

BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.牛津纳米孔测序、混合纠错及真核生物基因组的从头组装

Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致（OLC）方法的最佳性能。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

引用本文的文献

Long-Read Sequencing and Structural Variant Detection: Unlocking the Hidden Genome in Rare Genetic Disorders.长读长测序与结构变异检测：揭示罕见遗传病中的隐藏基因组

Diagnostics (Basel). 2025 Jul 17;15(14):1803. doi: 10.3390/diagnostics15141803.

EasyMetagenome: A user-friendly and flexible pipeline for shotgun metagenomic analysis in microbiome research.简易宏基因组：微生物组研究中用于鸟枪法宏基因组分析的用户友好型灵活流程。

Imeta. 2025 Feb 14;4(1):e70001. doi: 10.1002/imt2.70001. eCollection 2025 Feb.

TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation.TrAnnoScope：用于全长转录组分析和功能注释的模块化Snakemake工作流程

Genes (Basel). 2024 Nov 29;15(12):1547. doi: 10.3390/genes15121547.

Adaptable and comprehensive approaches for long-read nanopore sequencing of polyadenylated and non-polyadenylated RNAs.用于对多聚腺苷酸化和非多聚腺苷酸化RNA进行长读长纳米孔测序的适应性强且全面的方法。

Front Genet. 2024 Dec 2;15:1466338. doi: 10.3389/fgene.2024.1466338. eCollection 2024.

An optimized workflow of full-length transcriptome sequencing for accurate fusion transcript identification.全长转录组测序的优化工作流程，用于准确鉴定融合转录本。

RNA Biol. 2024 Jan;21(1):122-131. doi: 10.1080/15476286.2024.2425527. Epub 2024 Nov 14.

High-sensitivity in situ capture of endogenous RNA-protein interactions in fixed cells and primary tissues.在固定细胞和原代组织中高灵敏度原位捕获内源性 RNA-蛋白质相互作用。

Nat Commun. 2024 Aug 16;15(1):7067. doi: 10.1038/s41467-024-50363-4.

Maptcha: an efficient parallel workflow for hybrid genome scaffolding.Maptcha：一种用于混合基因组支架构建的高效并行工作流程。

BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。

Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.

Advances in long-read single-cell transcriptomics.长读长时程单细胞转录组学的进展。

Hum Genet. 2024 Oct;143(9-10):1005-1020. doi: 10.1007/s00439-024-02678-x. Epub 2024 May 24.

Hybrid-hybrid correction of errors in long reads with HERO.使用 HERO 对长读进行混合-混合纠错。

Genome Biol. 2023 Dec 1;24(1):275. doi: 10.1186/s13059-023-03112-7.

本文引用的文献

Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。

Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.采用多平行单分子实时 DNA 测序技术对 CHO 细胞重组 DNA 中的突变进行高灵敏度检测。

Biotechnol Bioeng. 2018 Jun;115(6):1485-1498. doi: 10.1002/bit.26561. Epub 2018 Feb 26.

FMLRC: Hybrid long read error correction using an FM-index.FMLRC：使用 FM-index 进行混合长读纠错。

BMC Bioinformatics. 2018 Feb 9;19(1):50. doi: 10.1186/s12859-018-2051-3.

High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell.利用单个纳米孔流动槽实现高连续性拟南芥基因组组装。

Nat Commun. 2018 Feb 7;9(1):541. doi: 10.1038/s41467-018-03016-2.

Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus.多平台测序方法揭示了伪狂犬病病毒中的一种新型转录组图谱。

Front Microbiol. 2018 Jan 22;8:2708. doi: 10.3389/fmicb.2017.02708. eCollection 2017.

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.利用第三代测序技术进行全基因组和转录组全景分析：以酿酒酵母 CEN.PK113-7D 为例。

Nucleic Acids Res. 2018 Apr 20;46(7):e38. doi: 10.1093/nar/gky014.

MinION-based long-read sequencing and assembly extends the reference genome.基于 MinION 的长读测序和组装扩展了参考基因组。

Genome Res. 2018 Feb;28(2):266-274. doi: 10.1101/gr.221184.117. Epub 2017 Dec 22.

Efficiency of PacBio long read correction by 2nd generation Illumina sequencing.二代 Illumina 测序提高 PacBio 长读纠错效率。

Genomics. 2019 Jan;111(1):43-49. doi: 10.1016/j.ygeno.2017.12.011. Epub 2017 Dec 18.

Single molecule real-time DNA sequencing of HLA genes at ultra-high resolution from 126 International HLA and Immunogenetics Workshop cell lines.从 126 个国际 HLA 和免疫遗传学工作坊细胞系中以超高分辨率对 HLA 基因进行单分子实时 DNA 测序。

HLA. 2018 Feb;91(2):88-101. doi: 10.1111/tan.13184. Epub 2017 Dec 20.

Nanopore sequencing of complex genomic rearrangements in yeast reveals mechanisms of repeat-mediated double-strand break repair.酵母中复杂基因组重排的纳米孔测序揭示了重复介导的双链断裂修复机制。

Genome Res. 2017 Dec;27(12):2072-2082. doi: 10.1101/gr.228148.117. Epub 2017 Nov 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对易错长读进行混合纠错方法的比较评估。

A comparative evaluation of hybrid error correction methods for error-prone long reads.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献