• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于组装大型基因组的简明数据结构。

Succinct data structures for assembling large genomes.

机构信息

NICTA Victoria Research Laboratory, Department of Computer Science and Engineering, The University of Melbourne, Parkville, Australia.

出版信息

Bioinformatics. 2011 Feb 15;27(4):479-86. doi: 10.1093/bioinformatics/btq697. Epub 2011 Jan 17.

DOI:10.1093/bioinformatics/btq697
PMID:21245053
Abstract

MOTIVATION

Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and fine-scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem and the in-practice memory requirements of tools.

RESULTS

In this article, we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches. We present results of a proof-of-concept assembly of a human genome performed on a modest commodity server.

摘要

动机

第二代测序技术使得许多研究能够获得足够的序列读取量,从而尝试对高等真核生物(包括哺乳动物)进行从头组装。从头组装不仅提供了一种工具来理解广泛的生物变异,而且在人类生物医学中,它提供了一种直接观察大规模结构变异和精细序列变异的方法。不幸的是,从头组装的计算可行性的改进并没有跟上序列数据收集的改进。这有两个原因:问题固有的计算复杂性和工具在实践中的内存需求。

结果

在本文中,我们使用熵压缩或简洁的数据结构来创建 de Bruijn 组装图的实用表示,这需要的存储空间至少比已部署方法使用的结构少一个数量级。此外,由于我们的表示是熵压缩的,因此在存在测序错误的情况下,它的渐近扩展行为比传统方法更好。我们展示了在一台普通商用服务器上对人类基因组进行概念验证组装的结果。

相似文献

1
Succinct data structures for assembling large genomes.用于组装大型基因组的简明数据结构。
Bioinformatics. 2011 Feb 15;27(4):479-86. doi: 10.1093/bioinformatics/btq697. Epub 2011 Jan 17.
2
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
3
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
4
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
5
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
6
DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.DBG2OLC:利用第三代测序技术的长错误读长进行大规模基因组的高效组装。
Sci Rep. 2016 Aug 30;6:31900. doi: 10.1038/srep31900.
7
Efficient construction of an assembly string graph using the FM-index.利用 FM 索引高效构建组装字符串图。
Bioinformatics. 2010 Jun 15;26(12):i367-73. doi: 10.1093/bioinformatics/btq217.
8
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.
9
Do it yourself guide to genome assembly.基因组组装自助指南。
Brief Funct Genomics. 2016 Jan;15(1):1-9. doi: 10.1093/bfgp/elu042. Epub 2014 Nov 11.
10
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.

引用本文的文献

1
Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets.康威-布罗姆-林登 (CBL):一种精确的、动态的 k-mer 集表示方法。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i48-i57. doi: 10.1093/bioinformatics/btae217.
2
ACO:lossless quality score compression based on adaptive coding order.ACO:基于自适应编码顺序的无损质量评分压缩。
BMC Bioinformatics. 2022 Jun 7;23(1):219. doi: 10.1186/s12859-022-04712-z.
3
A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.
一种三坐标坐标系统,用于快速准确地分析基于三色 de Bruijn 图的泛基因组。
BMC Bioinformatics. 2021 May 27;22(1):282. doi: 10.1186/s12859-021-04149-w.
4
Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage.在配备闪存存储的普通个人计算机中启用基因组学工作流程。
Front Genet. 2021 Apr 29;12:615958. doi: 10.3389/fgene.2021.615958. eCollection 2021.
5
Representation of -Mer Sets Using Spectrum-Preserving String Sets.使用谱保持串集表示 -Mer 集。
J Comput Biol. 2021 Apr;28(4):381-394. doi: 10.1089/cmb.2020.0431. Epub 2020 Dec 7.
6
Detection of simple and complex de novo mutations with multiple reference sequences.检测具有多个参考序列的简单和复杂从头突变。
Genome Res. 2020 Aug;30(8):1154-1169. doi: 10.1101/gr.255505.119. Epub 2020 Aug 19.
7
Succinct dynamic de Bruijn graphs.简明动态布儒瓦图。
Bioinformatics. 2021 Aug 4;37(14):1946-1952. doi: 10.1093/bioinformatics/btaa546.
8
Portable nanopore analytics: are we there yet?便携式纳米孔分析:我们做到了吗?
Bioinformatics. 2020 Aug 15;36(16):4399-4405. doi: 10.1093/bioinformatics/btaa237.
9
Metagenome SNP calling via read-colored de Bruijn graphs.通过读取颜色化的德布鲁因图进行宏基因组单核苷酸多态性(SNP)检测
Bioinformatics. 2021 Apr 1;36(22-23):5275-5281. doi: 10.1093/bioinformatics/btaa081.
10
Building large updatable colored de Bruijn graphs via merging.通过合并构建大型可更新彩色 de Bruijn 图。
Bioinformatics. 2019 Jul 15;35(14):i51-i60. doi: 10.1093/bioinformatics/btz350.