Liu Yuansheng, Li Yichen, Chen Enlian, Xu Jialu, Zhang Wenhai, Zeng Xiangxiang, Luo Xiao
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
College of Biology, Hunan University, Changsha, China.
Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.
Error self-correction is crucial for analyzing long-read sequencing data, but existing methods often struggle with noisy data or are tailored to technologies like PacBio HiFi. There is a gap in methods optimized for Nanopore R10 simplex reads, which typically have error rates below 2%. We introduce DeChat, a novel approach designed specifically for these reads. DeChat enables repeat- and haplotype-aware error correction, leveraging the strengths of both de Bruijn graphs and variant-aware multiple sequence alignment to create a synergistic approach. This approach avoids read overcorrection, ensuring that variants in repeats and haplotypes are preserved while sequencing errors are accurately corrected. Benchmarking on simulated and real datasets shows that DeChat-corrected reads have significantly fewer errors-up to two orders of magnitude lower-compared to other methods, without losing read information. Furthermore, DeChat-corrected reads clearly improves genome assembly and taxonomic classification.
错误自校正对于分析长读长测序数据至关重要,但现有方法往往难以处理噪声数据,或者是专门针对PacBio HiFi等技术设计的。对于通常错误率低于2%的纳米孔R10单倍型 reads,缺乏经过优化的方法。我们引入了DeChat,这是一种专门为这些reads设计的新方法。DeChat能够实现重复序列和单倍型感知的错误校正,利用de Bruijn图和变异感知多序列比对的优势,创建一种协同方法。这种方法避免了reads的过度校正,确保在准确校正测序错误的同时保留重复序列和单倍型中的变异。在模拟和真实数据集上的基准测试表明,与其他方法相比,经DeChat校正的reads错误显著减少,最多低两个数量级,且不会丢失reads信息。此外,经DeChat校正的reads明显改善了基因组组装和分类学分类。