Suppr超能文献

使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.

机构信息

Computer Science Department, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

出版信息

J Comput Biol. 2022 Dec;29(12):1288-1304. doi: 10.1089/cmb.2022.0275. Epub 2022 Sep 12.

Abstract

Minimizers are widely used to sample representative -mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few -mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of -mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences.

摘要

最小生成器在许多应用中被广泛用于从生物序列中采样有代表性的 -mer,例如读取映射和分类预测。在大多数情况下,希望最小生成器方案选择尽可能少的 -mer 位置(即,密度较低),以降低计算和内存成本。尽管人们对最小生成器越来越感兴趣,但学习具有最佳密度的有效方案仍然是一个悬而未决的问题,因为它需要在 -mer 排序的排列空间上解决一个明显具有挑战性的离散优化问题。大多数现有方案旨在在随机序列上表现良好,这对许多实际工具的适用性有限。另一方面,已经提出了几种方法来为特定目标序列构建最小生成器方案。然而,这些方法仅使用同样离散的替代任务来近似原始目标,这些替代任务无法显著提高密度性能。本文介绍了密度最小化目标的第一个连续松弛,即 DeepMinimizer,它采用了一种新颖的深度学习双胞胎架构,同时确保最小生成器方案的有效性和性能。我们的替代目标是完全可微分的,因此可以使用 GPU 计算进行高效的基于梯度的优化。最后,我们证明了 DeepMinimizer 发现的最小生成器方案在人类基因组序列上明显优于最先进的构建方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f750/9807081/2e4887988d66/cmb.2022.0275_figure1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验