神经元基序：通过深度神经网络的逐层解混来破译顺式调控代码。

NeuronMotif: Deciphering cis-regulatory codes by layer-wise demixing of deep neural networks.

机构信息

Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China.

Beijing Academy of Artificial Intelligence, Beijing 100084, China.

出版信息

Proc Natl Acad Sci U S A. 2023 Apr 11;120(15):e2216698120. doi: 10.1073/pnas.2216698120. Epub 2023 Apr 6.

DOI:10.1073/pnas.2216698120

PMID:37023129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10104575/

Abstract

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.

摘要

发现 DNA 调控序列基元和它们的相对位置对于理解基因表达调控的机制至关重要。尽管深度卷积神经网络（CNN）在预测顺式调控元件方面取得了巨大成功，但从这些 CNN 模型中发现基元和它们的组合模式仍然很困难。我们表明，主要的困难是由于多方面神经元的问题，这些神经元对多种类型的序列模式做出反应。由于现有的解释方法主要是为了可视化能够激活神经元的序列类别而设计的，因此产生的可视化将对应于多种模式的混合。如果不解决混合模式，这种混合通常很难解释。我们提出了 NeuronMotif 算法来解释这种神经元。给定网络中的任何卷积神经元（CN），NeuronMotif 首先生成一个能够激活 CN 的大量序列样本，这些序列通常由多种模式的混合物组成。然后，通过涉及的卷积层的特征图的向后聚类，以层为单位对序列进行“去混合”。NeuronMotif 可以输出序列基元，并且控制它们组合的语法规则由组织在树结构中的位置权重矩阵表示。与现有方法相比，NeuronMotif 发现的基元与 JASPAR 数据库中的已知基元有更多的匹配。文献和 ATAC-seq 足迹支持对深 CNN 中发现的高阶模式的研究。总体而言，NeuronMotif 能够从深 CNN 中破译顺式调控密码，并增强 CNN 在基因组解释中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a44c/10104575/6293e66fe483/pnas.2216698120fig01.jpg

相似文献

NeuronMotif: Deciphering cis-regulatory codes by layer-wise demixing of deep neural networks.神经元基序：通过深度神经网络的逐层解混来破译顺式调控代码。

Proc Natl Acad Sci U S A. 2023 Apr 11;120(15):e2216698120. doi: 10.1073/pnas.2216698120. Epub 2023 Apr 6.

Unveil cis-acting combinatorial mRNA motifs by interpreting deep neural network.通过解释深度神经网络揭示顺式作用组合 mRNA 基序。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i381-i389. doi: 10.1093/bioinformatics/btae262.

A Method for Predicting DNA Motif Length Based On Deep Learning.一种基于深度学习预测DNA基序长度的方法。

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):61-73. doi: 10.1109/TCBB.2022.3158471. Epub 2023 Feb 3.

A deep dive into understanding tumor foci classification using multiparametric MRI based on convolutional neural network.基于卷积神经网络，深入探究利用多参数磁共振成像进行肿瘤病灶分类。

Med Phys. 2020 Sep;47(9):4077-4086. doi: 10.1002/mp.14255. Epub 2020 Jun 12.

Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays.利用大规模平行报告基因实验的神经网络模型来破译调控 DNA 序列和非编码遗传变异。

PLoS One. 2019 Jun 17;14(6):e0218073. doi: 10.1371/journal.pone.0218073. eCollection 2019.

Discovering Gene Regulatory Elements Using Coverage-Based Heuristics.基于覆盖度启发式算法的基因调控元件发现

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1290-1300. doi: 10.1109/TCBB.2015.2496261. Epub 2015 Oct 30.

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.基于新型混合深度学习跨域知识整合方法的RNA-蛋白质结合基序挖掘

BMC Bioinformatics. 2017 Feb 28;18(1):136. doi: 10.1186/s12859-017-1561-8.

Representation learning of genomic sequence motifs with convolutional neural networks.利用卷积神经网络进行基因组序列基元的表示学习。

PLoS Comput Biol. 2019 Dec 19;15(12):e1007560. doi: 10.1371/journal.pcbi.1007560. eCollection 2019 Dec.

Identifying complex motifs in massive omics data with a variable-convolutional layer in deep neural network.利用深度神经网络中的可变卷积层识别大规模组学数据中的复杂基序。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab233.

Condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome.小鼠基因组中与顺式调控基序和模块的特定条件共调控

Genomics. 2006 Apr;87(4):500-8. doi: 10.1016/j.ygeno.2005.11.015. Epub 2006 Jan 23.

引用本文的文献

Synthetic promoter design in based on multinomial diffusion model.基于多项式扩散模型的合成启动子设计。

iScience. 2024 Oct 18;27(11):111207. doi: 10.1016/j.isci.2024.111207. eCollection 2024 Nov 15.

Decoding biology with massively parallel reporter assays and machine learning.利用大规模平行报告基因检测和机器学习解码生物学。

Genes Dev. 2024 Oct 16;38(17-20):843-865. doi: 10.1101/gad.351800.124.

Dynamic and Static Regulation of Nicotinamide Adenine Dinucleotide Phosphate: Strategies, Challenges, and Future Directions in Metabolic Engineering.烟酰胺腺嘌呤二核苷酸磷酸的动态和静态调控：代谢工程中的策略、挑战和未来方向。

Molecules. 2024 Aug 3;29(15):3687. doi: 10.3390/molecules29153687.

AI-Assisted Rational Design and Activity Prediction of Biological Elements for Optimizing Transcription-Factor-Based Biosensors.人工智能辅助的生物元件理性设计和活性预测，用于优化基于转录因子的生物传感器。

Molecules. 2024 Jul 26;29(15):3512. doi: 10.3390/molecules29153512.

Unveil cis-acting combinatorial mRNA motifs by interpreting deep neural network.通过解释深度神经网络揭示顺式作用组合 mRNA 基序。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i381-i389. doi: 10.1093/bioinformatics/btae262.

Enhancing glucaric acid production from -inositol in by eliminating cell-to-cell variation.通过消除细胞间变异性提高中 - 肌醇的葡萄糖酸产量。

Appl Environ Microbiol. 2024 Jun 18;90(6):e0014924. doi: 10.1128/aem.00149-24. Epub 2024 May 29.

Deep flanking sequence engineering for efficient promoter design using DeepSEED.使用 DeepSEED 进行高效启动子设计的深侧翼序列工程。

Nat Commun. 2023 Oct 9;14(1):6309. doi: 10.1038/s41467-023-41899-y.

本文引用的文献

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers.DeepSTARR 可根据 DNA 序列预测增强子活性，并能够从头设计合成增强子。

Nat Genet. 2022 May;54(5):613-624. doi: 10.1038/s41588-022-01048-5. Epub 2022 May 12.

JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022：转录因子结合谱开放获取数据库的第 9 个版本。

Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.

CTCF chromatin residence time controls three-dimensional genome organization, gene expression and DNA methylation in pluripotent cells.CTCF 染色质居留时间控制多能细胞的三维基因组组织、基因表达和 DNA 甲基化。

Nat Cell Biol. 2021 Aug;23(8):881-893. doi: 10.1038/s41556-021-00722-w. Epub 2021 Jul 29.

Improving representations of genomic sequence motifs in convolutional networks with exponential activations.利用指数激活函数改进卷积网络中基因组序列基序的表示。

Nat Mach Intell. 2021 Mar;3(3):258-266. doi: 10.1038/s42256-020-00291-x. Epub 2021 Feb 8.

DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires.DeepTCR 是一个深度学习框架，用于揭示 T 细胞受体库中的序列概念。

Nat Commun. 2021 Mar 11;12(1):1605. doi: 10.1038/s41467-021-21879-w.

Base-resolution models of transcription-factor binding reveal soft motif syntax.基于分辨率的转录因子结合模型揭示了软基序语法。

Nat Genet. 2021 Mar;53(3):354-366. doi: 10.1038/s41588-021-00782-6. Epub 2021 Feb 18.

ATAC-seq footprinting unravels kinetics of transcription factor binding during zygotic genome activation.转座酶可及性染色质测序足迹分析揭示了合子基因组激活过程中转录因子结合的动力学。

Nat Commun. 2020 Aug 26;11(1):4267. doi: 10.1038/s41467-020-18035-1.

Loss of CHD1 Promotes Heterogeneous Mechanisms of Resistance to AR-Targeted Therapy via Chromatin Dysregulation.CHD1 缺失通过染色质失调促进 AR 靶向治疗耐药的异质性机制。

Cancer Cell. 2020 Apr 13;37(4):584-598.e11. doi: 10.1016/j.ccell.2020.03.001. Epub 2020 Mar 26.

Representation learning of genomic sequence motifs with convolutional neural networks.利用卷积神经网络进行基因组序列基元的表示学习。

PLoS Comput Biol. 2019 Dec 19;15(12):e1007560. doi: 10.1371/journal.pcbi.1007560. eCollection 2019 Dec.

JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020：转录因子结合谱开放获取数据库的更新。

Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经元基序：通过深度神经网络的逐层解混来破译顺式调控代码。

NeuronMotif: Deciphering cis-regulatory codes by layer-wise demixing of deep neural networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献