全局重要性分析：一种用于量化深度神经网络中基因组特征重要性的可解释性方法。

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.

Department of Biostatistics, Harvard University, Cambridge, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2021 May 13;17(5):e1008925. doi: 10.1371/journal.pcbi.1008925. eCollection 2021 May.

DOI:10.1371/journal.pcbi.1008925

PMID:33983921

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8118286/

Abstract

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.

摘要

深度神经网络在预测 DNA 和 RNA 结合蛋白的序列特异性方面的表现优于以前依赖于 K -mer 和位置权重矩阵的方法。为了深入了解为什么 DNN 做出特定的预测，可以使用模型可解释性方法（如归因方法）来识别给定序列上的基序样表示。由于解释是基于单个序列给出的，并且在序列之间可能有很大差异，因此推断整个数据集的可推广趋势并量化其效应大小仍然是一个挑战。在这里，我们引入了全局重要性分析（GIA），这是一种模型可解释性方法，用于量化假定模式对模型预测的群体效应大小。GIA 提供了一种定量检验假定模式及其与其他模式相互作用的假设的方法，以及映射网络所学的具体功能的方法。作为一个案例研究，我们展示了 GIA 在从序列预测 RNA-蛋白质相互作用的计算任务中的效用。我们首先引入了一个卷积网络，我们称之为 ResidualBind，并在 RNAcompete 数据上对其性能进行基准测试，与以前的方法进行比较。然后，我们使用 GIA 证明，除了序列基序外，ResidualBind 还学习了一种模型，该模型考虑了基序的数量、它们的间隔以及序列上下文，例如 RNA 二级结构和 GC 偏倚。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/662b/8118286/ce9420179583/pcbi.1008925.g001.jpg

相似文献

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

PLoS Comput Biol. 2021 May 13;17(5):e1008925. doi: 10.1371/journal.pcbi.1008925. eCollection 2021 May.

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks.

Methods Mol Biol. 2023;2586:197-215. doi: 10.1007/978-1-0716-2768-6_12.

DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

Pac Symp Biocomput. 2017;22:254-265. doi: 10.1142/9789813207813_0025.

Representation learning of genomic sequence motifs with convolutional neural networks.

PLoS Comput Biol. 2019 Dec 19;15(12):e1007560. doi: 10.1371/journal.pcbi.1007560. eCollection 2019 Dec.

Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks.

BMC Genomics. 2018 Jul 3;19(1):511. doi: 10.1186/s12864-018-4889-1.

ExplaiNN: interpretable and transparent neural networks for genomics.

Genome Biol. 2023 Jun 27;24(1):154. doi: 10.1186/s13059-023-02985-y.

Interpretation of deep learning in genomics and epigenomics.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa177.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

mRNA-CLA: An interpretable deep learning approach for predicting mRNA subcellular localization.

Methods. 2024 Jul;227:17-26. doi: 10.1016/j.ymeth.2024.04.018. Epub 2024 May 3.

Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.

Proc Mach Learn Res. 2022 Nov;200:131-149.

引用本文的文献

Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.

Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.

DGAT: A Dual-Graph Attention Network for Inferring Spatial Protein Landscapes from Transcriptomics.

bioRxiv. 2025 Jul 9:2025.07.05.662121. doi: 10.1101/2025.07.05.662121.

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Genome Biol. 2025 Jul 14;26(1):203. doi: 10.1186/s13059-025-03674-8.

Perspective on recent developments and challenges in regulatory and systems genomics.

Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.

Developing a general AI model for integrating diverse genomic modalities and comprehensive genomic knowledge.

bioRxiv. 2025 May 14:2025.05.08.652986. doi: 10.1101/2025.05.08.652986.

Massive experimental quantification allows interpretable deep learning of protein aggregation.

Sci Adv. 2025 May 2;11(18):eadt5111. doi: 10.1126/sciadv.adt5111. Epub 2025 Apr 30.

Gauge fixing for sequence-function relationships.

PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Interpreting -regulatory mechanisms from genomic deep neural networks using surrogate models.

Nat Mach Intell. 2024 Jun;6(6):701-713. doi: 10.1038/s42256-024-00851-5. Epub 2024 Jun 21.

Interpreting the CTCF-mediated sequence grammar of genome folding with AkitaV2.

PLoS Comput Biol. 2025 Feb 4;21(2):e1012824. doi: 10.1371/journal.pcbi.1012824. eCollection 2025 Feb.

Understanding Health-Related Discussions on Reddit: Development of a Topic Assignment Method and Exploratory Analysis.

JMIR Form Res. 2025 Jan 29;9:e55309. doi: 10.2196/55309.

本文引用的文献

Improving representations of genomic sequence motifs in convolutional networks with exponential activations.

Nat Mach Intell. 2021 Mar;3(3):258-266. doi: 10.1038/s42256-020-00291-x. Epub 2021 Feb 8.

Base-resolution models of transcription-factor binding reveal soft motif syntax.

Nat Genet. 2021 Mar;53(3):354-366. doi: 10.1038/s41588-021-00782-6. Epub 2021 Feb 18.

Deep learning for inferring transcription factor binding sites.

Curr Opin Syst Biol. 2020 Feb;19:16-23. doi: 10.1016/j.coisb.2020.04.001. Epub 2020 Jun 11.

Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study.

Genome Biol. 2020 Jun 19;21(1):149. doi: 10.1186/s13059-020-02055-7.

DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning.

Nucleic Acids Res. 2020 Jul 27;48(13):7099-7118. doi: 10.1093/nar/gkaa530.

Deep neural networks for interpreting RNA-binding protein target preferences.

Genome Res. 2020 Feb;30(2):214-226. doi: 10.1101/gr.247494.118. Epub 2020 Jan 28.

Representation learning of genomic sequence motifs with convolutional neural networks.

PLoS Comput Biol. 2019 Dec 19;15(12):e1007560. doi: 10.1371/journal.pcbi.1007560. eCollection 2019 Dec.

Logomaker: beautiful sequence logos in Python.

Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921.

Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction.

PLoS Comput Biol. 2019 Sep 4;15(9):e1007283. doi: 10.1371/journal.pcbi.1007283. eCollection 2019 Sep.

Visualizing complex feature interactions and feature sharing in genomic deep neural networks.

BMC Bioinformatics. 2019 Jul 19;20(1):401. doi: 10.1186/s12859-019-2957-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全局重要性分析：一种用于量化深度神经网络中基因组特征重要性的可解释性方法。

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.

Department of Biostatistics, Harvard University, Cambridge, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2021 May 13;17(5):e1008925. doi: 10.1371/journal.pcbi.1008925. eCollection 2021 May.

DOI:10.1371/journal.pcbi.1008925

PMID:33983921

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8118286/

Abstract

摘要

全局重要性分析：一种用于量化深度神经网络中基因组特征重要性的可解释性方法。

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

全局重要性分析：一种用于量化深度神经网络中基因组特征重要性的可解释性方法。

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献