用于组装大型基因组的简明数据结构。

Succinct data structures for assembling large genomes.

机构信息

NICTA Victoria Research Laboratory, Department of Computer Science and Engineering, The University of Melbourne, Parkville, Australia.

出版信息

Bioinformatics. 2011 Feb 15;27(4):479-86. doi: 10.1093/bioinformatics/btq697. Epub 2011 Jan 17.

DOI:10.1093/bioinformatics/btq697

PMID:21245053

Abstract

MOTIVATION

Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and fine-scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem and the in-practice memory requirements of tools.

RESULTS

In this article, we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches. We present results of a proof-of-concept assembly of a human genome performed on a modest commodity server.

摘要

动机

第二代测序技术使得许多研究能够获得足够的序列读取量，从而尝试对高等真核生物（包括哺乳动物）进行从头组装。从头组装不仅提供了一种工具来理解广泛的生物变异，而且在人类生物医学中，它提供了一种直接观察大规模结构变异和精细序列变异的方法。不幸的是，从头组装的计算可行性的改进并没有跟上序列数据收集的改进。这有两个原因：问题固有的计算复杂性和工具在实践中的内存需求。

结果

在本文中，我们使用熵压缩或简洁的数据结构来创建 de Bruijn 组装图的实用表示，这需要的存储空间至少比已部署方法使用的结构少一个数量级。此外，由于我们的表示是熵压缩的，因此在存在测序错误的情况下，它的渐近扩展行为比传统方法更好。我们展示了在一台普通商用服务器上对人类基因组进行概念验证组装的结果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于组装大型基因组的简明数据结构。

Succinct data structures for assembling large genomes.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

用于组装大型基因组的简明数据结构。

Succinct data structures for assembling large genomes.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献