从头组装大型基因组的装配器。

Assembler for de novo assembly of large genomes.

机构信息

Institute of Information Science, Academia Sinica, Taipei 115, Taiwan.

出版信息

Proc Natl Acad Sci U S A. 2013 Sep 3;110(36):E3417-24. doi: 10.1073/pnas.1314090110. Epub 2013 Aug 21.

DOI:10.1073/pnas.1314090110

PMID:23966565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3767511/

Abstract

Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using datasets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.

摘要

使用下一代测序reads 组装大型基因组需要大量的计算机内存和较长的执行时间。为了减少这些需求，我们提出了一种基于扩展的组装器，称为 JR-Assembler，其中 J 和 R 分别代表“跳跃”扩展和读取“重映射”。首先，它使用读取计数来选择高质量的reads 作为种子。其次，它通过全读扩展过程扩展每个种子，这可以加快扩展过程并跳过短重复。第三，它使用动态回溯修剪过程来避免由于测序错误导致的扩展终止。第四，它将reads 重新映射到每个组装的序列上，如果由于存在重复而导致组装错误，则在重复边界处打断连续体。第五，它应用较不严格的扩展标准来连接低覆盖率区域。最后，它通过未使用的reads 合并 contigs。使用来自小、中、大基因组的数据集，对 JR-Assembler 与当前组装器进行了广泛的比较，结果表明 JR-Assembler 实现了更好或可比的整体组装质量，并且需要更少的内存和更少的中央处理器时间，特别是对于大型基因组。最后，一项模拟研究表明，当读取长度为 150bp 或更长时，JR-Assembler 在内存使用和中央处理器时间方面的性能优于大多数当前的组装器，这表明随着下一代测序技术的发展，JR-Assembler 相对于当前组装器的优势将随着读取长度的增加而增加。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

从头组装大型基因组的装配器。

Assembler for de novo assembly of large genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

从头组装大型基因组的装配器。

Assembler for de novo assembly of large genomes.

机构信息

出版信息