Suppr超能文献

通过密集连接卷积神经网络整合远端和近端信息来预测基因表达。

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

机构信息

MOE Key Laboratory of Bioinformatics, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China.

CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China.

出版信息

Bioinformatics. 2020 Jan 15;36(2):496-503. doi: 10.1093/bioinformatics/btz562.

Abstract

MOTIVATION

Interactions among cis-regulatory elements such as enhancers and promoters are main driving forces shaping context-specific chromatin structure and gene expression. Although there have been computational methods for predicting gene expression from genomic and epigenomic information, most of them neglect long-range enhancer-promoter interactions, due to the difficulty in precisely linking regulatory enhancers to target genes. Recently, HiChIP, a novel high-throughput experimental approach, has generated comprehensive data on high-resolution interactions between promoters and distal enhancers. Moreover, plenty of studies suggest that deep learning achieves state-of-the-art performance in epigenomic signal prediction, and thus promoting the understanding of regulatory elements. In consideration of these two factors, we integrate proximal promoter sequences and HiChIP distal enhancer-promoter interactions to accurately predict gene expression.

RESULTS

We propose DeepExpression, a densely connected convolutional neural network, to predict gene expression using both promoter sequences and enhancer-promoter interactions. We demonstrate that our model consistently outperforms baseline methods, not only in the classification of binary gene expression status but also in regression of continuous gene expression levels, in both cross-validation experiments and cross-cell line predictions. We show that the sequential promoter information is more informative than the experimental enhancer information; meanwhile, the enhancer-promoter interactions within ±100 kbp around the TSS of a gene are most beneficial. We finally visualize motifs in both promoter and enhancer regions and show the match of identified sequence signatures with known motifs. We expect to see a wide spectrum of applications using HiChIP data in deciphering the mechanism of gene regulation.

AVAILABILITY AND IMPLEMENTATION

DeepExpression is freely available at https://github.com/wanwenzeng/DeepExpression.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

顺式调控元件(如增强子和启动子)之间的相互作用是塑造特定于上下文的染色质结构和基因表达的主要驱动力。尽管已经有一些从基因组和表观基因组信息预测基因表达的计算方法,但由于难以精确地将调节增强子与靶基因联系起来,大多数方法都忽略了长程增强子-启动子相互作用。最近,一种新型高通量实验方法 HiChIP,已经生成了关于启动子和远端增强子之间高分辨率相互作用的全面数据。此外,大量研究表明,深度学习在表观基因组信号预测方面达到了最先进的性能,从而促进了对调节元件的理解。考虑到这两个因素,我们整合了近端启动子序列和 HiChIP 远端增强子-启动子相互作用,以准确预测基因表达。

结果

我们提出了 DeepExpression,这是一种密集连接的卷积神经网络,用于使用启动子序列和增强子-启动子相互作用来预测基因表达。我们证明,我们的模型不仅在二元基因表达状态的分类中,而且在连续基因表达水平的回归中,都始终优于基线方法,无论是在交叉验证实验还是跨细胞系预测中。我们表明,序列启动子信息比实验增强子信息更具信息量;同时,在基因 TSS 周围 ±100 kbp 内的增强子-启动子相互作用最有益。我们最后在启动子和增强子区域可视化了基序,并显示了识别的序列特征与已知基序的匹配。我们希望看到使用 HiChIP 数据在破译基因调控机制方面的广泛应用。

可用性和实现

DeepExpression 可在 https://github.com/wanwenzeng/DeepExpression 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验