Lee Byunghan, Moon Taesup, Yoon Sungroh, Weissman Tsachy
Electrical and Computer Engineering, Seoul National University, Seoul, Korea.
College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea.
PLoS One. 2017 Jul 27;12(7):e0181463. doi: 10.1371/journal.pone.0181463. eCollection 2017.
We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq.
我们考虑对下一代靶向扩增子测序产生的核苷酸序列中的错误进行校正。下一代测序(NGS)平台因其高通量能够提供大量测序数据,但相关错误率往往较高。因此,高通量测序中的去噪已成为提高下游分析可靠性的关键过程。我们的方法名为DUDE-Seq,源自对由离散无记忆信道损坏的有限值源数据进行重构的一般设置,并能有效校正替换错误和同聚物插入缺失错误,这是大多数高通量靶向扩增子测序平台中的两种主要测序错误类型。我们对真实和模拟数据集的实验研究表明,所提出的DUDE-Seq不仅在纠错能力和时间效率方面优于现有方法,还提高了下游分析的可靠性。此外,DUDE-Seq的灵活性使其能够通过简单更新噪声模型而稳健地应用于不同的测序平台和分析流程。DUDE-Seq可在http://data.snu.ac.kr/pub/dude-seq获取。