Suppr超能文献

BLESS:基于布隆过滤器的高通量测序读错误纠正解决方案。

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

机构信息

Department of Electrical and Computer Engineering, Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.

Abstract

MOTIVATION

Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers.

RESULTS

We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

AVAILABILITY AND IMPLEMENTATION

Freely available at http://sourceforge.net/p/bless-ec

CONTACT

dchen@illinois.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

下一代测序(NGS)技术的快速发展导致基因组信息量呈指数级增长。然而,NGS 读取比传统测序方法的数据包含更多的错误,通过纠正这些错误可以改进下游基因组分析结果。不幸的是,所有以前的错误纠正方法都需要大量的内存,使得它不适合使用商用计算机处理来自大型基因组的读取。

结果

我们提出了一种新的算法,与以前的解决方案相比,该算法使用的内存更少,但能产生更准确的纠正结果。该算法名为基于布隆过滤器的高通量测序读取错误纠正解决方案(BLESS),它使用单个最小尺寸的布隆过滤器,并且还能够容忍更高的假阳性率,因此,与以前的方法相比,我们可以平均减少 40 倍的内存使用来纠正错误。同时,BLESS 可以像 DNA 装配器一样扩展读取,从而纠正读取末尾的错误。使用真实和模拟读取进行的评估表明,BLESS 可以生成比现有解决方案更准确的结果。使用 BLESS 纠正错误后,最初无法对齐的读取中有 69%可以正确对齐。此外,从头组装的结果变长了 50%,组装错误减少了 66%。

可用性和实现

可在 http://sourceforge.net/p/bless-ec 上免费获得

联系人

dchen@illinois.edu

补充信息

补充数据可在生物信息学在线获得。

相似文献

2
BLESS 2: accurate, memory-efficient and fast error correction method.BLESS 2:精确、内存高效且快速的纠错方法。
Bioinformatics. 2016 Aug 1;32(15):2369-71. doi: 10.1093/bioinformatics/btw146. Epub 2016 Mar 24.
3
EC: an efficient error correction algorithm for short reads.EC:一种用于短读段的高效纠错算法。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.
4
BFC: correcting Illumina sequencing errors.BFC:校正Illumina测序错误。
Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.
10
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

引用本文的文献

2
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
3

本文引用的文献

1
Turtle: identifying frequent k-mers with cache-efficient algorithms.海龟:使用缓存高效算法识别频繁的 k-mer。
Bioinformatics. 2014 Jul 15;30(14):1950-7. doi: 10.1093/bioinformatics/btu132. Epub 2014 Mar 10.
3
Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.
4
Disk-based k-mer counting on a PC.基于磁盘的 k-mer 计数在个人计算机上的实现。
BMC Bioinformatics. 2013 May 16;14:160. doi: 10.1186/1471-2105-14-160.
5
Probabilistic error correction for RNA sequencing.RNA 测序的概率错误纠正。
Nucleic Acids Res. 2013 May 1;41(10):e109. doi: 10.1093/nar/gkt215. Epub 2013 Apr 4.
6
DSK: k-mer counting with very low memory usage.DSK:使用极低内存进行 k-mer 计数。
Bioinformatics. 2013 Mar 1;29(5):652-3. doi: 10.1093/bioinformatics/btt020. Epub 2013 Jan 16.
9
Decoding the human genome.解读人类基因组。
Genome Res. 2012 Sep;22(9):1599-601. doi: 10.1101/gr.146175.112.
10
Estimation of sequencing error rates in short reads.短读测序错误率的估计。
BMC Bioinformatics. 2012 Jul 30;13:185. doi: 10.1186/1471-2105-13-185.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验