Suppr超能文献

下一代测序reads 的错误过滤、配对组装和纠错。

Error filtering, pair assembly and error correction for next-generation sequencing reads.

机构信息

Tiburon, CA 94920, USA and.

Department of Micro- and Nanotechnology, Technical University of Denmark, DK-2800 Lyngby, Denmark.

出版信息

Bioinformatics. 2015 Nov 1;31(21):3476-82. doi: 10.1093/bioinformatics/btv401. Epub 2015 Jul 2.

Abstract

MOTIVATION

Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low.

RESULTS

We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores.

AVAILABILITY AND IMPLEMENTATION

These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch.

CONTACT

robert@drive5.com

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

当覆盖率较低时,下一代测序会产生大量数据,其中的错误很难与真实的生物变异区分开来。

结果

我们通过三种独立的方法大大降低了错误频率,尤其是对于高错误率的读取:(i)根据预期错误数过滤读取,(ii)组装重叠的读取对,以及(iii)对于扩增子读取,利用独特的序列丰度进行错误纠正。我们还表明,大多数已发表的成对读取组装程序计算出不正确的后验质量评分。

可用性和实现

这些方法在 USEARCH 包中实现。二进制文件可在 http://drive5.com/usearch 上免费获得。

联系

robert@drive5.com

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验