Suppr超能文献

FLASH:快速调整短读长以提高基因组组装质量。

FLASH: fast length adjustment of short reads to improve genome assemblies.

机构信息

McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.

出版信息

Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7.

Abstract

MOTIVATION

Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.

RESULTS

We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.

AVAILABILITY AND IMPLEMENTATION

The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.

CONTACT

t.magoc@gmail.com.

摘要

动机

新一代测序技术会产生大量的短读段。即使基因组的覆盖深度非常高,短读段的长度也会给从头组装带来问题。使用片段长度短于读段长度两倍的配对末端文库,通过在组装基因组之前对读段对进行重叠和合并,可以生成更长的读段。

结果

我们提出了一种快速的计算工具 FLASH,用于通过重叠来自足够短的片段文库的配对末端读段来延长短读段的长度。我们在一百万对模拟读段上测试了该工具的正确性,然后将其应用于来自金黄色葡萄球菌的 Illumina 读段和人类染色体 14 的基因组组装的预处理。FLASH 能够以 <1%的错误率正确地扩展和合并模拟读段中 >99%的读段。在设置适当的参数时,即使读段中包含高达 5%的错误,FLASH 也能正确地合并读段超过 90%的时间。当在组装之前使用 FLASH 来扩展读段时,生成的组装在 contigs 和 scaffolds 方面的 N50 长度都有显著提高。

可用性和实现

FLASH 系统是用 C 编写的,作为开源代码在 http://www.cbcb.umd.edu/software/flash 上免费提供。

联系方式

t.magoc@gmail.com

相似文献

1
FLASH: fast length adjustment of short reads to improve genome assemblies.FLASH:快速调整短读长以提高基因组组装质量。
Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7.
3
PEAR: a fast and accurate Illumina Paired-End reAd mergeR.PEAR:一种快速而准确的 Illumina 双端读取合并器。
Bioinformatics. 2014 Mar 1;30(5):614-20. doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18.
4
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
5
The MaSuRCA genome assembler.马苏尔卡基因组组装器。
Bioinformatics. 2013 Nov 1;29(21):2669-77. doi: 10.1093/bioinformatics/btt476. Epub 2013 Aug 29.

引用本文的文献

本文引用的文献

2
Quake: quality-aware detection and correction of sequencing errors.Quake:测序错误的质量感知检测和校正。
Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.
3
Unlocking short read sequencing for metagenomics.解锁宏基因组学的短读测序。
PLoS One. 2010 Jul 28;5(7):e11840. doi: 10.1371/journal.pone.0011840.
6
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.
8
Aggressive assembly of pyrosequencing reads with mates.将焦磷酸测序读数与配对序列进行积极组装。
Bioinformatics. 2008 Dec 15;24(24):2818-24. doi: 10.1093/bioinformatics/btn548. Epub 2008 Oct 24.
9

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验