• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用一阶泰勒展开对计算机模拟饱和诱变实验进行快速有效的近似。

Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion.

作者信息

Sasse Alexander, Chikina Maria, Mostafavi Sara

机构信息

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA.

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 16354, USA.

出版信息

iScience. 2024 Aug 23;27(9):110807. doi: 10.1016/j.isci.2024.110807. eCollection 2024 Sep 20.

DOI:10.1016/j.isci.2024.110807
PMID:39286491
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11404212/
Abstract

To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of regulatory motifs for gene regulation. The most commonly applied method is saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.

摘要

为了理解基因组序列到功能模型的决策过程,可解释人工智能算法确定给定输入序列中每个核苷酸对模型预测的重要性,并有助于发现基因调控的调控基序。最常用的方法是饱和诱变(ISM),因为其每个核苷酸的重要性得分可以直观地理解为饱和诱变实验的计算对应物。虽然ISM具有高度可解释性,但对许多序列进行计算具有挑战性,并且随着输入序列长度和模型大小的增加而变得难以承受。在这里,我们使用一阶泰勒近似从模型梯度近似ISM值,这将其计算成本降低到对输入序列的单次前向传递。我们表明,泰勒ISM(TISM)近似在不同的模型消融、随机初始化、训练参数和数据集大小方面都很稳健。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/c816a6a77f76/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/0791f243ec86/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/c6ce21faa513/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/b50c730bb586/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/c816a6a77f76/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/0791f243ec86/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/c6ce21faa513/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/b50c730bb586/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5680/11404212/c816a6a77f76/gr3.jpg

相似文献

1
Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion.利用一阶泰勒展开对计算机模拟饱和诱变实验进行快速有效的近似。
iScience. 2024 Aug 23;27(9):110807. doi: 10.1016/j.isci.2024.110807. eCollection 2024 Sep 20.
2
fastISM: performant in silico saturation mutagenesis for convolutional neural networks.fastISM:用于卷积神经网络的高效计算机模拟饱和突变。
Bioinformatics. 2022 Apr 28;38(9):2397-2403. doi: 10.1093/bioinformatics/btac135.
3
Accelerating in silico saturation mutagenesis using compressed sensing.利用压缩感知加速计算机模拟饱和突变。
Bioinformatics. 2022 Jul 11;38(14):3557-3564. doi: 10.1093/bioinformatics/btac385.
4
Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。
Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.
5
Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.水质模型的数据驱动演变:创新异常值检测方法的深入研究——以爱尔兰水质指数(IEWQI)模型为例
Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.
6
Iterative saturation mutagenesis: a powerful approach to engineer proteins by systematically simulating Darwinian evolution.迭代饱和诱变:一种通过系统模拟达尔文进化来改造蛋白质的强大方法。
Methods Mol Biol. 2014;1179:103-28. doi: 10.1007/978-1-4939-1053-3_7.
7
Development of an ensemble CNN model with explainable AI for the classification of gastrointestinal cancer.基于可解释人工智能的集成 CNN 模型在胃肠道癌分类中的开发。
PLoS One. 2024 Jun 25;19(6):e0305628. doi: 10.1371/journal.pone.0305628. eCollection 2024.
8
DC-Analyzer-facilitated combinatorial strategy for rapid directed evolution of functional enzymes with multiple mutagenesis sites.DC分析器辅助的组合策略用于具有多个诱变位点的功能酶的快速定向进化。
J Biotechnol. 2014 Dec 20;192 Pt A:102-7. doi: 10.1016/j.jbiotec.2014.10.023.
9
A novel approach of brain-computer interfacing (BCI) and Grad-CAM based explainable artificial intelligence: Use case scenario for smart healthcare.一种新的脑机接口 (BCI) 和基于 Grad-CAM 的可解释人工智能方法:智能医疗保健用例场景。
J Neurosci Methods. 2024 Aug;408:110159. doi: 10.1016/j.jneumeth.2024.110159. Epub 2024 May 7.
10
Advanced interpretable diagnosis of Alzheimer's disease using SECNN-RF framework with explainable AI.使用带有可解释人工智能的SECNN-RF框架对阿尔茨海默病进行高级可解释诊断。
Front Artif Intell. 2024 Sep 2;7:1456069. doi: 10.3389/frai.2024.1456069. eCollection 2024.

引用本文的文献

1
Gauge fixing for sequence-function relationships.序列-功能关系的规范固定
PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. eCollection 2025.
2
Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution.通过提高模型分辨率来完善由序列到活性模型所学习到的顺式调控语法。
bioRxiv. 2025 Jan 27:2025.01.24.634804. doi: 10.1101/2025.01.24.634804.
3
Gauge fixing for sequence-function relationships.序列-功能关系的规范固定。

本文引用的文献

1
Correcting gradient-based interpretations of deep neural networks for genomics.纠正基于梯度的深度学习神经网络在基因组学中的解释。
Genome Biol. 2023 May 9;24(1):109. doi: 10.1186/s13059-023-02956-3.
2
The genetic and biochemical determinants of mRNA degradation rates in mammals.哺乳动物中 mRNA 降解速率的遗传和生化决定因素。
Genome Biol. 2022 Nov 23;23(1):245. doi: 10.1186/s13059-022-02811-x.
3
Explaining a series of models by propagating Shapley values.通过传播 Shapley 值来解释一系列模型。
bioRxiv. 2024 Jun 24:2024.05.12.593772. doi: 10.1101/2024.05.12.593772.
Nat Commun. 2022 Aug 3;13(1):4512. doi: 10.1038/s41467-022-31384-3.
4
Accelerating in silico saturation mutagenesis using compressed sensing.利用压缩感知加速计算机模拟饱和突变。
Bioinformatics. 2022 Jul 11;38(14):3557-3564. doi: 10.1093/bioinformatics/btac385.
5
Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale.基于序列的从千碱基到染色体尺度的三维基因组结构建模。
Nat Genet. 2022 May;54(5):725-734. doi: 10.1038/s41588-022-01065-4. Epub 2022 May 12.
6
fastISM: performant in silico saturation mutagenesis for convolutional neural networks.fastISM:用于卷积神经网络的高效计算机模拟饱和突变。
Bioinformatics. 2022 Apr 28;38(9):2397-2403. doi: 10.1093/bioinformatics/btac135.
7
Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用,从序列中有效预测基因表达。
Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.
8
Improving representations of genomic sequence motifs in convolutional networks with exponential activations.利用指数激活函数改进卷积网络中基因组序列基序的表示。
Nat Mach Intell. 2021 Mar;3(3):258-266. doi: 10.1038/s42256-020-00291-x. Epub 2021 Feb 8.
9
Predicting 3D genome folding from DNA sequence with Akita.利用赤池信息准则预测 DNA 序列的三维基因组折叠
Nat Methods. 2020 Nov;17(11):1111-1117. doi: 10.1038/s41592-020-0958-x. Epub 2020 Oct 12.
10
DeepC: predicting 3D genome folding using megabase-scale transfer learning.DeepC:使用兆碱基规模的迁移学习预测 3D 基因组折叠。
Nat Methods. 2020 Nov;17(11):1118-1124. doi: 10.1038/s41592-020-0960-3. Epub 2020 Oct 12.