Suppr超能文献

一种用于序列到函数学习的具有内在可解释性的神经网络架构。

An intrinsically interpretable neural network architecture for sequence-to-function learning.

机构信息

Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States.

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i413-i422. doi: 10.1093/bioinformatics/btad271.

Abstract

MOTIVATION

Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs.

RESULTS

We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition.

AVAILABILITY AND IMPLEMENTATION

The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.

摘要

动机

基于序列的深度学习方法已经被证明可以预测多种功能基因组读数,包括开放染色质区域和基因的 RNA 表达。然而,目前方法的一个主要限制是模型解释依赖于计算要求高的事后分析,即使这样,人们通常也无法解释高度参数化模型的内部机制。在这里,我们引入了一种称为完全可解释序列到功能模型(tiSFM)的深度学习架构。tiSFM 在使用较少参数的同时,提高了标准多层卷积模型的性能。此外,虽然 tiSFM 本身在技术上是一个多层神经网络,但内部模型参数可以根据相关序列基序进行内在解释。

结果

我们分析了发表的造血谱系细胞类型的开放染色质测量结果,表明 tiSFM 优于专门针对该数据集定制的最先进的卷积神经网络模型。我们还表明,它可以正确识别转录因子在造血分化中具有已知作用的特定于上下文的活性,包括 B 细胞中的 Pax5 和 Ebf1,以及先天淋巴细胞中的 Rorc。tiSFM 的模型参数具有生物学意义上的解释,我们展示了我们的方法在预测作为发育转变函数的表观遗传状态变化的复杂任务中的效用。

可用性和实现

源代码,包括关键发现的分析脚本,可以在 https://github.com/boooooogey/ATAConv 上找到,用 Python 实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe2a/10311317/e568c282fda6/btad271f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验