• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于弦图理论的混合并行策略,用于在天河二号超级计算机上改进从头DNA组装

A Hybrid Parallel Strategy Based on String Graph Theory to Improve De Novo DNA Assembly on the TianHe-2 Supercomputer.

作者信息

Zhang Feng, Liao Xiangke, Peng Shaoliang, Cui Yingbo, Wang Bingqiang, Zhu Xiaoqian, Liu Jie

机构信息

Department of Computer Science, National University of Defense Technology, Changsha, 410073, China.

National Supercomputing Center in Shenzhen, Shenzhen, 518055, China.

出版信息

Interdiscip Sci. 2016 Jun;8(2):169-176. doi: 10.1007/s12539-015-0127-6. Epub 2015 Sep 24.

DOI:10.1007/s12539-015-0127-6
PMID:26403255
Abstract

' The de novo assembly of DNA sequences is increasingly important for biological researches in the genomic era. After more than one decade since the Human Genome Project, some challenges still exist and new solutions are being explored to improve de novo assembly of genomes. String graph assembler (SGA), based on the string graph theory, is a new method/tool developed to address the challenges. In this paper, based on an in-depth analysis of SGA we prove that the SGA-based sequence de novo assembly is an NP-complete problem. According to our analysis, SGA outperforms other similar methods/tools in memory consumption, but costs much more time, of which 60-70 % is spent on the index construction. Upon this analysis, we introduce a hybrid parallel optimization algorithm and implement this algorithm in the TianHe-2's parallel framework. Simulations are performed with different datasets. For data of small size the optimized solution is 3.06 times faster than before, and for data of middle size it's 1.60 times. The results demonstrate an evident performance improvement, with the linear scalability for parallel FM-index construction. This results thus contribute significantly to improving the efficiency of de novo assembly of DNA sequences.

摘要

在基因组时代,DNA序列的从头组装对于生物学研究变得越来越重要。自人类基因组计划开展十多年以来,仍然存在一些挑战,并且正在探索新的解决方案以改进基因组的从头组装。基于字符串图理论的字符串图组装器(SGA)是为应对这些挑战而开发的一种新方法/工具。在本文中,通过对SGA的深入分析,我们证明基于SGA的序列从头组装是一个NP完全问题。根据我们的分析,SGA在内存消耗方面优于其他类似的方法/工具,但花费的时间要多得多,其中60 - 70%的时间用于索引构建。基于此分析,我们引入了一种混合并行优化算法,并在天河二号的并行框架中实现了该算法。使用不同的数据集进行了模拟。对于小尺寸数据,优化后的解决方案比以前快3.06倍,对于中等尺寸数据快1.60倍。结果表明性能有明显提升,并行FM索引构建具有线性可扩展性。因此,这些结果对提高DNA序列从头组装的效率有显著贡献。

相似文献

1
A Hybrid Parallel Strategy Based on String Graph Theory to Improve De Novo DNA Assembly on the TianHe-2 Supercomputer.一种基于弦图理论的混合并行策略,用于在天河二号超级计算机上改进从头DNA组装
Interdiscip Sci. 2016 Jun;8(2):169-176. doi: 10.1007/s12539-015-0127-6. Epub 2015 Sep 24.
2
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.
3
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
4
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
5
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
6
Clover: a clustering-oriented de novo assembler for Illumina sequences.Clover:一款面向聚类的 Illumina 序列从头组装程序。
BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.
7
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
8
LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.LSG:一种用于为下一代测序数据组装计算字符串图的外部存储工具。
J Comput Biol. 2016 Mar;23(3):137-49. doi: 10.1089/cmb.2015.0172.
9
String graph construction using incremental hashing.使用增量哈希的字符串图构建。
Bioinformatics. 2014 Dec 15;30(24):3515-23. doi: 10.1093/bioinformatics/btu578. Epub 2014 Sep 2.
10
Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly.利用全基因组从头组装进行单样本 SNP 和 INDEL 调用的探索。
Bioinformatics. 2012 Jul 15;28(14):1838-44. doi: 10.1093/bioinformatics/bts280. Epub 2012 May 7.