Suppr超能文献

神经元基序:通过深度神经网络的逐层解混来破译顺式调控代码。

NeuronMotif: Deciphering cis-regulatory codes by layer-wise demixing of deep neural networks.

机构信息

Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China.

Beijing Academy of Artificial Intelligence, Beijing 100084, China.

出版信息

Proc Natl Acad Sci U S A. 2023 Apr 11;120(15):e2216698120. doi: 10.1073/pnas.2216698120. Epub 2023 Apr 6.

Abstract

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.

摘要

发现 DNA 调控序列基元和它们的相对位置对于理解基因表达调控的机制至关重要。尽管深度卷积神经网络(CNN)在预测顺式调控元件方面取得了巨大成功,但从这些 CNN 模型中发现基元和它们的组合模式仍然很困难。我们表明,主要的困难是由于多方面神经元的问题,这些神经元对多种类型的序列模式做出反应。由于现有的解释方法主要是为了可视化能够激活神经元的序列类别而设计的,因此产生的可视化将对应于多种模式的混合。如果不解决混合模式,这种混合通常很难解释。我们提出了 NeuronMotif 算法来解释这种神经元。给定网络中的任何卷积神经元(CN),NeuronMotif 首先生成一个能够激活 CN 的大量序列样本,这些序列通常由多种模式的混合物组成。然后,通过涉及的卷积层的特征图的向后聚类,以层为单位对序列进行“去混合”。NeuronMotif 可以输出序列基元,并且控制它们组合的语法规则由组织在树结构中的位置权重矩阵表示。与现有方法相比,NeuronMotif 发现的基元与 JASPAR 数据库中的已知基元有更多的匹配。文献和 ATAC-seq 足迹支持对深 CNN 中发现的高阶模式的研究。总体而言,NeuronMotif 能够从深 CNN 中破译顺式调控密码,并增强 CNN 在基因组解释中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a44c/10104575/6293e66fe483/pnas.2216698120fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验