使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.

作者信息

Liu Yuansheng, Li Yichen, Chen Enlian, Xu Jialu, Zhang Wenhai, Zeng Xiangxiang, Luo Xiao

机构信息

College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.

College of Biology, Hunan University, Changsha, China.

出版信息

Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.

DOI:10.1038/s42003-024-07376-y

PMID:39702496

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11659559/

Abstract

Error self-correction is crucial for analyzing long-read sequencing data, but existing methods often struggle with noisy data or are tailored to technologies like PacBio HiFi. There is a gap in methods optimized for Nanopore R10 simplex reads, which typically have error rates below 2%. We introduce DeChat, a novel approach designed specifically for these reads. DeChat enables repeat- and haplotype-aware error correction, leveraging the strengths of both de Bruijn graphs and variant-aware multiple sequence alignment to create a synergistic approach. This approach avoids read overcorrection, ensuring that variants in repeats and haplotypes are preserved while sequencing errors are accurately corrected. Benchmarking on simulated and real datasets shows that DeChat-corrected reads have significantly fewer errors-up to two orders of magnitude lower-compared to other methods, without losing read information. Furthermore, DeChat-corrected reads clearly improves genome assembly and taxonomic classification.

摘要

错误自校正对于分析长读长测序数据至关重要，但现有方法往往难以处理噪声数据，或者是专门针对PacBio HiFi等技术设计的。对于通常错误率低于2%的纳米孔R10单倍型 reads，缺乏经过优化的方法。我们引入了DeChat，这是一种专门为这些reads设计的新方法。DeChat能够实现重复序列和单倍型感知的错误校正，利用de Bruijn图和变异感知多序列比对的优势，创建一种协同方法。这种方法避免了reads的过度校正，确保在准确校正测序错误的同时保留重复序列和单倍型中的变异。在模拟和真实数据集上的基准测试表明，与其他方法相比，经DeChat校正的reads错误显著减少，最多低两个数量级，且不会丢失reads信息。此外，经DeChat校正的reads明显改善了基因组组装和分类学分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f38/11659559/297a73d4a2d7/42003_2024_7376_Fig1_HTML.jpg

相似文献

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。

Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.

De novo diploid genome assembly using long noisy reads.从头组装具有长噪声读长的二倍体基因组。

Nat Commun. 2024 Apr 5;15(1):2964. doi: 10.1038/s41467-024-47349-7.

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.使用 PEPPER-Margin-DeepVariant 进行单体型感知变异调用可实现纳米孔长读段的高精度。

Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1.

Accurate self-correction of errors in long reads using de Bruijn graphs.使用德布鲁因图对长读段中的错误进行准确的自我校正。

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.相位图：基于长读长的二倍体基因组单体型感知从头组装

Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x.

NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.NextPolish2：一种针对使用 HiFi 长读长组装的基因组进行重复感知优化的工具。

Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具：见解与考虑。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。

BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.

引用本文的文献

Comparative evaluation of sequencing platforms: Pacific Biosciences, Oxford Nanopore Technologies, and Illumina for 16S rRNA-based soil microbiome profiling.测序平台的比较评估：用于基于16S rRNA的土壤微生物群落分析的太平洋生物科学公司、牛津纳米孔技术公司和Illumina平台

Front Microbiol. 2025 Aug 6;16:1633360. doi: 10.3389/fmicb.2025.1633360. eCollection 2025.

MicroRNAs in long COVID: roles, diagnostic biomarker potential and detection.长新冠中的微小RNA：作用、诊断生物标志物潜力及检测

Hum Genomics. 2025 Aug 13;19(1):90. doi: 10.1186/s40246-025-00810-0.

The Emerging Role of Omics-Based Approaches in Plant Virology.基于组学方法在植物病毒学中的新兴作用。

Viruses. 2025 Jul 15;17(7):986. doi: 10.3390/v17070986.

Pooled, Long-read Sequencing for Structural Variant Characterization in Schistosome Populations.用于血吸虫种群结构变异特征分析的合并长读长测序

Genome Biol Evol. 2025 Jul 3;17(7). doi: 10.1093/gbe/evaf127.

Tracking Antimicrobial Resistant Organisms Timely: a workflow validation study for successive core-genome SNP-based nosocomial transmission analysis.及时追踪抗菌药物耐药性微生物：基于连续核心基因组单核苷酸多态性的医院内传播分析的工作流程验证研究

JAC Antimicrob Resist. 2025 May 7;7(3):dlaf069. doi: 10.1093/jacamr/dlaf069. eCollection 2025 Jun.

本文引用的文献

High-quality metagenome assembly from long accurate reads with metaMDBG.使用 metaMDBG 从长而准确的读取中进行高质量的宏基因组组装。

Nat Biotechnol. 2024 Sep;42(9):1378-1383. doi: 10.1038/s41587-023-01983-6. Epub 2024 Jan 2.

Evaluation of haplotype-aware long-read error correction with hifieval.利用 hifieval 评估基于单倍型感知的长读纠错。

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad631.

Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction.比较 R9.4.1/Kit10 和 R10/Kit12 Oxford Nanopore 流动池和化学试剂在细菌基因组重建中的应用。

Microb Genom. 2023 Jan;9(1). doi: 10.1099/mgen.0.000910.

VeChat: correcting errors in long reads using variation graphs.VeChat：使用变异图谱纠正长读中的错误。

Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.

Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing.牛津纳米孔 R10.4 长读测序能够从纯培养物和宏基因组中生成近乎完成的细菌基因组，而无需进行短读测序或参考序列优化。

Nat Methods. 2022 Jul;19(7):823-826. doi: 10.1038/s41592-022-01539-7. Epub 2022 Jul 4.

Enhancing Long-Read-Based Strain-Aware Metagenome Assembly.增强基于长读长的菌株感知宏基因组组装

Front Genet. 2022 May 13;13:868280. doi: 10.3389/fgene.2022.868280. eCollection 2022.

Metagenome assembly of high-fidelity long reads with hifiasm-meta.利用 hifiasm-meta 进行高保真长读长的宏基因组组装。

Nat Methods. 2022 Jun;19(6):671-674. doi: 10.1038/s41592-022-01478-3. Epub 2022 May 9.

Haplotype-resolved assembly of diploid genomes without parental data.单体型解析组装二倍体基因组，无需父母本数据。

Nat Biotechnol. 2022 Sep;40(9):1332-1335. doi: 10.1038/s41587-022-01261-x. Epub 2022 Mar 24.

Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar.四倍体马铃薯品种的染色体水平和单倍型分辨率基因组组装。

Nat Genet. 2022 Mar;54(3):342-348. doi: 10.1038/s41588-022-01015-0. Epub 2022 Mar 3.

Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.多重 de Bruijn 图可从长的、高保真的读取中进行基因组组装。

Nat Biotechnol. 2022 Jul;40(7):1075-1081. doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献