Department of Electrical and Computer Engineering, Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.
Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers.
We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.
Freely available at http://sourceforge.net/p/bless-ec
Supplementary data are available at Bioinformatics online.
下一代测序(NGS)技术的快速发展导致基因组信息量呈指数级增长。然而,NGS 读取比传统测序方法的数据包含更多的错误,通过纠正这些错误可以改进下游基因组分析结果。不幸的是,所有以前的错误纠正方法都需要大量的内存,使得它不适合使用商用计算机处理来自大型基因组的读取。
我们提出了一种新的算法,与以前的解决方案相比,该算法使用的内存更少,但能产生更准确的纠正结果。该算法名为基于布隆过滤器的高通量测序读取错误纠正解决方案(BLESS),它使用单个最小尺寸的布隆过滤器,并且还能够容忍更高的假阳性率,因此,与以前的方法相比,我们可以平均减少 40 倍的内存使用来纠正错误。同时,BLESS 可以像 DNA 装配器一样扩展读取,从而纠正读取末尾的错误。使用真实和模拟读取进行的评估表明,BLESS 可以生成比现有解决方案更准确的结果。使用 BLESS 纠正错误后,最初无法对齐的读取中有 69%可以正确对齐。此外,从头组装的结果变长了 50%,组装错误减少了 66%。
可在 http://sourceforge.net/p/bless-ec 上免费获得
补充数据可在生物信息学在线获得。