Suppr超能文献

利用一阶泰勒展开对计算机模拟饱和诱变实验进行快速有效的近似。

Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion.

作者信息

Sasse Alexander, Chikina Maria, Mostafavi Sara

机构信息

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA.

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 16354, USA.

出版信息

iScience. 2024 Aug 23;27(9):110807. doi: 10.1016/j.isci.2024.110807. eCollection 2024 Sep 20.

Abstract

To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of regulatory motifs for gene regulation. The most commonly applied method is saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.

摘要

为了理解基因组序列到功能模型的决策过程,可解释人工智能算法确定给定输入序列中每个核苷酸对模型预测的重要性,并有助于发现基因调控的调控基序。最常用的方法是饱和诱变(ISM),因为其每个核苷酸的重要性得分可以直观地理解为饱和诱变实验的计算对应物。虽然ISM具有高度可解释性,但对许多序列进行计算具有挑战性,并且随着输入序列长度和模型大小的增加而变得难以承受。在这里,我们使用一阶泰勒近似从模型梯度近似ISM值,这将其计算成本降低到对输入序列的单次前向传递。我们表明,泰勒ISM(TISM)近似在不同的模型消融、随机初始化、训练参数和数据集大小方面都很稳健。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/0791f243ec86/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验