• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据集自适应最小化器阶数降低了-mer计数中的内存使用量。

Data Set-Adaptive Minimizer Order Reduces Memory Usage in -Mer Counting.

作者信息

Flomin Dan, Pellow David, Shamir Ron

机构信息

Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

J Comput Biol. 2022 Aug;29(8):825-838. doi: 10.1089/cmb.2021.0599. Epub 2022 May 6.

DOI:10.1089/cmb.2021.0599
PMID:35527644
Abstract

The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including -mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we introduce a method for tailoring the order to the data set. Our method repeatedly samples the data set and modifies the order so as to flatten the -mer load distribution across minimizers. We integrated our method into Gerbil, a state-of-the-art memory-efficient -mer counter, and were able to reduce its memory footprint by 30%-50% for large , with only a minor increase in runtime. Our tests also showed that the orders produced by our method produced superior results when transferred across data sets from the same species, with little or no order change. This enables memory reduction with essentially no increase in runtime.

摘要

深度测序实验的快速持续增长需要开发和改进许多用于分析大型测序数据集的生物信息学应用程序,包括k-mer计数和组装。有几种应用程序通过对序列进行分箱来减少内存使用。分箱是通过使用最小化器方案来完成的,这些方案依赖于最小化器的特定顺序。已经证明,顺序的选择对应用程序的性能有重大影响。在这里,我们介绍一种根据数据集定制顺序的方法。我们的方法反复对数据集进行采样并修改顺序,以便使k-mer负载分布在最小化器之间趋于平坦。我们将我们的方法集成到Gerbil中,这是一种最先进的内存高效k-mer计数器,对于大型k,我们能够将其内存占用减少30%-50%,而运行时仅略有增加。我们的测试还表明,我们的方法产生的顺序在跨同一物种的数据集转移时产生了更好的结果,顺序变化很小或没有变化。这使得在运行时基本不增加的情况下减少了内存。

相似文献

1
Data Set-Adaptive Minimizer Order Reduces Memory Usage in -Mer Counting.数据集自适应最小化器阶数降低了-mer计数中的内存使用量。
J Comput Biol. 2022 Aug;29(8):825-838. doi: 10.1089/cmb.2021.0599. Epub 2022 May 6.
2
Efficient minimizer orders for large values of using minimum decycling sets.利用最小去环集对大 值 进行有效最小化排序。
Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.
3
A simple refined DNA minimizer operator enables 2-fold faster computation.一个简单的改进 DNA 简化操作符可以使计算速度提高 2 倍。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae045.
4
Compact and evenly distributed k-mer binning for genomic sequences.用于基因组序列的紧凑且均匀分布的k-mer分箱
Bioinformatics. 2021 Sep 9;37(17):2563-2569. doi: 10.1093/bioinformatics/btab156.
5
Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。
J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.
6
Improved design and analysis of practical minimizers.实用极小化器的改进设计与分析。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i119-i127. doi: 10.1093/bioinformatics/btaa472.
7
Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.使用k-mer布隆过滤器提高序列数据上的布隆过滤器性能。
J Comput Biol. 2017 Jun;24(6):547-557. doi: 10.1089/cmb.2016.0155. Epub 2016 Nov 9.
8
Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。
J Comput Biol. 2022 Dec;29(12):1288-1304. doi: 10.1089/cmb.2022.0275. Epub 2022 Sep 12.
9
Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.在使用k-mer索引查找最大精确匹配时,比较固定采样和最小化采样。
PLoS One. 2018 Feb 1;13(2):e0189960. doi: 10.1371/journal.pone.0189960. eCollection 2018.
10
A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106× human sequence data in 2.7 hours.一种通用的、近精确的低内存消耗 k-mer 计数方法,可在 2.7 小时内完成 106×人类序列数据的从头组装。
Bioinformatics. 2020 Dec 30;36(Suppl_2):i625-i633. doi: 10.1093/bioinformatics/btaa890.

引用本文的文献

1
Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme.广义掩蔽最小化草图方案的密度和守恒优化。
J Comput Biol. 2024 Jan;31(1):2-20. doi: 10.1089/cmb.2023.0212. Epub 2023 Nov 17.
2
Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。
J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.
3
Efficient minimizer orders for large values of using minimum decycling sets.利用最小去环集对大 值 进行有效最小化排序。
Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.
4
Navigating bottlenecks and trade-offs in genomic data analysis.基因组数据分析中的瓶颈与权衡。
Nat Rev Genet. 2023 Apr;24(4):235-250. doi: 10.1038/s41576-022-00551-z. Epub 2022 Dec 7.