Suppr超能文献

利用深度学习方法表征启动子和增强子序列

Characterizing Promoter and Enhancer Sequences by a Deep Learning Method.

作者信息

Zeng Xin, Park Sung-Joon, Nakai Kenta

机构信息

Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan.

Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.

出版信息

Front Genet. 2021 Jun 15;12:681259. doi: 10.3389/fgene.2021.681259. eCollection 2021.

Abstract

Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20-120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs.

摘要

启动子和增强子是众所周知的调节基因表达的调控元件。高通量测序技术证实,这些调控元件是双向转录的。也就是说,启动子在正义方向产生稳定的mRNA,在反义方向产生不稳定的RNA,而增强子在两个方向都转录不稳定的RNA。尽管人们认为增强子和启动子具有相似的转录起始位点(TSS)结构,但转录机制如何将这些基因组区域明确地用作启动子或增强子仍不清楚。为了解决这个问题,我们利用卷积神经网络(CNN)和显著性算法开发了一种深度学习(DL)方法。与其他分类器相比,我们的CNN表现出更高的预测性能,这表明CNN捕获的高阶序列特征具有至关重要的意义。此外,我们的方法揭示了增强子和启动子之间存在显著的序列差异。值得注意的是,双向TSS中心下游20 - 120 bp的区域似乎对RNA稳定性有贡献。与增强子中的区域相比,启动子中的这些区域往往含有更多的鸟嘌呤和胞嘧啶,这一特征有助于调控元件的分类。我们基于CNN的方法可以捕获复杂的TSS结构。我们发现,启动子和增强子TSS周围的基因组区域对RNA稳定性有贡献,并表现出GC偏向特征,这是启动子TSS的关键决定因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84c5/8239401/6c6b47d17f6a/fgene-12-681259-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验