使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.

机构信息

Computer Science Department, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

出版信息

J Comput Biol. 2022 Dec;29(12):1288-1304. doi: 10.1089/cmb.2022.0275. Epub 2022 Sep 12.

DOI:10.1089/cmb.2022.0275

PMID:36095142

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9807081/

Abstract

Minimizers are widely used to sample representative -mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few -mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of -mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences.

摘要

最小生成器在许多应用中被广泛用于从生物序列中采样有代表性的 -mer，例如读取映射和分类预测。在大多数情况下，希望最小生成器方案选择尽可能少的 -mer 位置（即，密度较低），以降低计算和内存成本。尽管人们对最小生成器越来越感兴趣，但学习具有最佳密度的有效方案仍然是一个悬而未决的问题，因为它需要在 -mer 排序的排列空间上解决一个明显具有挑战性的离散优化问题。大多数现有方案旨在在随机序列上表现良好，这对许多实际工具的适用性有限。另一方面，已经提出了几种方法来为特定目标序列构建最小生成器方案。然而，这些方法仅使用同样离散的替代任务来近似原始目标，这些替代任务无法显著提高密度性能。本文介绍了密度最小化目标的第一个连续松弛，即 DeepMinimizer，它采用了一种新颖的深度学习双胞胎架构，同时确保最小生成器方案的有效性和性能。我们的替代目标是完全可微分的，因此可以使用 GPU 计算进行高效的基于梯度的优化。最后，我们证明了 DeepMinimizer 发现的最小生成器方案在人类基因组序列上明显优于最先进的构建方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f750/9807081/2e4887988d66/cmb.2022.0275_figure1.jpg

相似文献

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。

J Comput Biol. 2022 Dec;29(12):1288-1304. doi: 10.1089/cmb.2022.0275. Epub 2022 Sep 12.

Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme.广义掩蔽最小化草图方案的密度和守恒优化。

J Comput Biol. 2024 Jan;31(1):2-20. doi: 10.1089/cmb.2023.0212. Epub 2023 Nov 17.

A simple refined DNA minimizer operator enables 2-fold faster computation.一个简单的改进 DNA 简化操作符可以使计算速度提高 2 倍。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae045.

Efficient minimizer orders for large values of using minimum decycling sets.利用最小去环集对大值进行有效最小化排序。

Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.

Data Set-Adaptive Minimizer Order Reduces Memory Usage in -Mer Counting.数据集自适应最小化器阶数降低了-mer计数中的内存使用量。

J Comput Biol. 2022 Aug;29(8):825-838. doi: 10.1089/cmb.2021.0599. Epub 2022 May 6.

Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。

J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.

Improved design and analysis of practical minimizers.实用极小化器的改进设计与分析。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i119-i127. doi: 10.1093/bioinformatics/btaa472.

Weighted minimizer sampling improves long read mapping.加权最小化抽样提高长读测序数据的比对。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i111-i118. doi: 10.1093/bioinformatics/btaa435.

Syncmers are more sensitive than minimizers for selecting conserved ‑mers in biological sequences.同步寡聚体在选择生物序列中的保守寡聚体方面比最小寡聚体更敏感。

PeerJ. 2021 Feb 5;9:e10805. doi: 10.7717/peerj.10805. eCollection 2021.

Asymptotically optimal minimizers schemes.渐近最优极小化方案。

Bioinformatics. 2018 Jul 1;34(13):i13-i22. doi: 10.1093/bioinformatics/bty258.

引用本文的文献

GreedyMini: generating low-density DNA minimizers.GreedyMini：生成低密度DNA最小化子

Bioinformatics. 2025 Jul 1;41(Supplement_1):i275-i284. doi: 10.1093/bioinformatics/btaf251.

Locality-aware pooling enhances protein language model performance across varied applications.局部感知池化可提升蛋白质语言模型在各种应用中的性能。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i217-i226. doi: 10.1093/bioinformatics/btaf178.

A near-tight lower bound on the density of forward sampling schemes.前向采样方案密度的一个近乎紧密的下界。

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae736.

A near-tight lower bound on the density of forward sampling schemes.前向采样方案密度的一个近乎紧密的下界。

bioRxiv. 2024 Nov 19:2024.09.06.611668. doi: 10.1101/2024.09.06.611668.

When less is more: sketching with minimizers in genomics.少即是多：基因组学中的最小化器草图。

Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.

Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets.使用最小去环集保证小窗口的草图方法。

J Comput Biol. 2024 Jul;31(7):597-615. doi: 10.1089/cmb.2024.0544. Epub 2024 Jul 9.

Sketching methods with small window guarantee using minimum decycling sets.使用最小去环集保证小窗口的绘制方法。

ArXiv. 2023 Nov 6:arXiv:2311.03592v1.

Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme.广义掩蔽最小化草图方案的密度和守恒优化。

J Comput Biol. 2024 Jan;31(1):2-20. doi: 10.1089/cmb.2023.0212. Epub 2023 Nov 17.

Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。

J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.

Efficient minimizer orders for large values of using minimum decycling sets.利用最小去环集对大值进行有效最小化排序。

Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.

本文引用的文献

A Randomized Parallel Algorithm for Efficiently Finding Near-Optimal Universal Hitting Sets.一种用于高效找到近似最优通用命中集的随机并行算法。

Res Comput Mol Biol. 2020 May;12074:37-53. doi: 10.1007/978-3-030-45257-5_3. Epub 2020 Apr 21.

Long-read mapping to repetitive reference sequences using Winnowmap2.使用Winnowmap2将长读段映射到重复参考序列。

Nat Methods. 2022 Jun;19(6):705-710. doi: 10.1038/s41592-022-01457-8. Epub 2022 Apr 1.

Sequence-specific minimizers via polar sets.通过极集实现序列特异性最小化。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i187-i195. doi: 10.1093/bioinformatics/btab313.

Syncmers are more sensitive than minimizers for selecting conserved ‑mers in biological sequences.同步寡聚体在选择生物序列中的保守寡聚体方面比最小寡聚体更敏感。

PeerJ. 2021 Feb 5;9:e10805. doi: 10.7717/peerj.10805. eCollection 2021.

Telomere-to-telomere assembly of a complete human X chromosome.端粒到端粒组装完整的人类 X 染色体。

Nature. 2020 Sep;585(7823):79-84. doi: 10.1038/s41586-020-2547-7. Epub 2020 Jul 14.

Improved design and analysis of practical minimizers.实用极小化器的改进设计与分析。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i119-i127. doi: 10.1093/bioinformatics/btaa472.

Weighted minimizer sampling improves long read mapping.加权最小化抽样提高长读测序数据的比对。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i111-i118. doi: 10.1093/bioinformatics/btaa435.

Asymptotically optimal minimizers schemes.渐近最优极小化方案。

Bioinformatics. 2018 Jul 1;34(13):i13-i22. doi: 10.1093/bioinformatics/bty258.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.设计小型通用k-mer命中集以改进对高通量测序的分析

PLoS Comput Biol. 2017 Oct 2;13(10):e1005777. doi: 10.1371/journal.pcbi.1005777. eCollection 2017 Oct.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献