• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DIDA:分布式索引调度对齐

DIDA: Distributed Indexing Dispatched Alignment.

作者信息

Mohamadi Hamid, Vandervalk Benjamin P, Raymond Anthony, Jackman Shaun D, Chu Justin, Breshears Clay P, Birol Inanc

机构信息

Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada; Department of Bioinformatics, University of British Columbia, Vancouver, BC, Canada; Intel Health and Life Sciences, Intel Corporation, Hillsboro, OR, US.

Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada.

出版信息

PLoS One. 2015 Apr 29;10(4):e0126409. doi: 10.1371/journal.pone.0126409. eCollection 2015.

DOI:10.1371/journal.pone.0126409
PMID:25923767
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4414605/
Abstract

One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use.

摘要

受高通量测序数据泛滥影响的生物信息学中的一个重要应用是序列比对问题,即针对目标查询核苷酸或氨基酸序列以找到相似度高的区域。当查询数量过多和/或目标过大时,比对过程在计算上就会变得具有挑战性。这通常通过预处理技术来解决,即在搜索匹配项时对查询和/或目标进行索引以便于访问。当目标是静态的,例如在已建立的参考基因组中时,索引成本可通过重用生成的索引来摊销。然而,当目标是非静态的,例如在从头组装过程的中间步骤中的重叠群时,每次运行都必须计算一个新的索引。为了解决此类可扩展性问题,我们提出了DIDA,这是一个新颖的框架,它将索引和比对任务分布到一组计算节点上的较小子任务中。它提供了一种超越简单并行实现常见做法的工作流程。就内存使用和运行时而言,DIDA是用于序列比对问题的一种经济高效、可扩展且模块化的框架。它可用于大规模比对以绘制基因组草图和从头组装运行的中间阶段。DIDA的源代码、示例文件和用户手册可通过http://www.bcgsc.ca/platform/bioinfo/software/dida获取。该软件根据不列颠哥伦比亚癌症机构许可(BCCA)发布,供学术使用免费。

相似文献

1
DIDA: Distributed Indexing Dispatched Alignment.DIDA:分布式索引调度对齐
PLoS One. 2015 Apr 29;10(4):e0126409. doi: 10.1371/journal.pone.0126409. eCollection 2015.
2
Pegasys: software for executing and integrating analyses of biological sequences.派格萨斯:用于执行和整合生物序列分析的软件。
BMC Bioinformatics. 2004 Apr 19;5:40. doi: 10.1186/1471-2105-5-40.
3
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
4
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).Windows .NET网络分布式基本局部比对搜索工具包(W.ND-BLAST)。
BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93.
5
GramAlign: fast alignment driven by grammar-based phylogeny.GramAlign:基于语法系统发育的快速比对
Methods Mol Biol. 2014;1079:171-89. doi: 10.1007/978-1-62703-646-7_11.
6
De novo transcriptome assembly with ABySS.使用 ABySS 进行从头转录组组装。
Bioinformatics. 2009 Nov 1;25(21):2872-7. doi: 10.1093/bioinformatics/btp367. Epub 2009 Jun 15.
7
Multiple sequence alignment with DIALIGN.使用DIALIGN进行多序列比对。
Methods Mol Biol. 2014;1079:191-202. doi: 10.1007/978-1-62703-646-7_12.
8
Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.用于非并行生物信息学应用程序的并行工作流管理器,以在超级计算机上解决大规模生物学问题。
J Bioinform Comput Biol. 2016 Apr;14(2):1641008. doi: 10.1142/S0219720016410080.
9
AQUA: automated quality improvement for multiple sequence alignments.AQUA:多序列比对的自动化质量改进。
Bioinformatics. 2010 Jan 15;26(2):263-5. doi: 10.1093/bioinformatics/btp651. Epub 2009 Nov 19.
10
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.

引用本文的文献

1
Long-Read MDM4 Sequencing Reveals Aberrant Isoform Landscape in Metastatic Melanomas.长读 MDM4 测序揭示转移性黑色素瘤中异常的异构体景观。
Int J Mol Sci. 2024 Aug 30;25(17):9415. doi: 10.3390/ijms25179415.
2
HySec-Flow: Privacy-Preserving Genomic Computing with SGX-based Big-Data Analytics Framework.HySec-Flow:基于SGX的大数据分析框架实现隐私保护的基因组计算
IEEE Int Conf Cloud Comput. 2021 Sep;2021:733-743. doi: 10.1109/CLOUD53861.2021.00098. Epub 2021 Nov 13.
3
Covering, corner-searching and occupying: A three-stage intelligent algorithm for the 2d multishape part packing problem.

本文引用的文献

1
BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters.BioBloom工具:使用布隆过滤器进行快速、准确且内存高效的宿主物种序列筛选。
Bioinformatics. 2014 Dec 1;30(23):3402-4. doi: 10.1093/bioinformatics/btu558. Epub 2014 Aug 20.
2
Using cascading Bloom filters to improve the memory usage for de Brujin graphs.使用级联布隆过滤器来提高德布鲁因图的内存使用率。
Algorithms Mol Biol. 2014 Feb 24;9(1):2. doi: 10.1186/1748-7188-9-2.
3
Space-efficient and exact de Bruijn graph representation based on a Bloom filter.
覆盖、角搜索和占据:二维多形状零件包装问题的三阶段智能算法。
PLoS One. 2022 May 31;17(5):e0268514. doi: 10.1371/journal.pone.0268514. eCollection 2022.
4
Technology dictates algorithms: recent developments in read alignment.技术决定算法:读段比对的最新进展。
Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.
5
Positions and covering: A two-stage methodology to obtain optimal solutions for the 2d-bin packing problem.位置和覆盖:一种用于获得二维装箱问题最优解的两阶段方法。
PLoS One. 2020 Apr 6;15(4):e0229358. doi: 10.1371/journal.pone.0229358. eCollection 2020.
6
Featherweight long read alignment using partitioned reference indexes.使用分区参考索引进行轻量级长文本对齐。
Sci Rep. 2019 Mar 13;9(1):4318. doi: 10.1038/s41598-019-40739-8.
7
ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data.ChopStitch:使用转录组组装和全基因组测序数据进行外显子注释和剪接图构建。
Bioinformatics. 2018 May 15;34(10):1697-1704. doi: 10.1093/bioinformatics/btx839.
基于布隆过滤器的空间高效且精确的德布鲁因图表示。
Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22.
4
Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.利用全基因组鸟枪法测序数据组装 20 Gb 白云杉(Picea glauca)基因组。
Bioinformatics. 2013 Jun 15;29(12):1492-7. doi: 10.1093/bioinformatics/btt178. Epub 2013 May 22.
5
The GEM mapper: fast, accurate and versatile alignment by filtration.GEM 映射器:通过过滤实现快速、准确和通用的比对。
Nat Methods. 2012 Dec;9(12):1185-8. doi: 10.1038/nmeth.2221. Epub 2012 Oct 28.
6
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
7
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
8
Efficient counting of k-mers in DNA sequences using a bloom filter.使用布隆过滤器高效计数 DNA 序列中的 k-mer。
BMC Bioinformatics. 2011 Aug 10;12:333. doi: 10.1186/1471-2105-12-333.
9
mrsFAST: a cache-oblivious algorithm for short-read mapping.mrsFAST:一种用于短读段映射的缓存无关算法。
Nat Methods. 2010 Aug;7(8):576-7. doi: 10.1038/nmeth0810-576.
10
Classification of DNA sequences using Bloom filters.使用布隆过滤器对 DNA 序列进行分类。
Bioinformatics. 2010 Jul 1;26(13):1595-600. doi: 10.1093/bioinformatics/btq230. Epub 2010 May 13.