Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China.
Proc Natl Acad Sci U S A. 2021 Feb 9;118(6). doi: 10.1073/pnas.2007450118.
RNA polymerase II (Pol II) generally pauses at certain positions along gene bodies, thereby interrupting the transcription elongation process, which is often coupled with various important biological functions, such as precursor mRNA splicing and gene expression regulation. Characterizing the transcriptional elongation dynamics can thus help us understand many essential biological processes in eukaryotic cells. However, experimentally measuring Pol II elongation rates is generally time and resource consuming. We developed PEPMAN (polymerase II elongation pausing modeling through attention-based deep neural network), a deep learning-based model that accurately predicts Pol II pausing sites based on the native elongating transcript sequencing (NET-seq) data. Through fully taking advantage of the attention mechanism, PEPMAN is able to decipher important sequence features underlying Pol II pausing. More importantly, we demonstrated that the analyses of the PEPMAN-predicted results around various types of alternative splicing sites can provide useful clues into understanding the cotranscriptional splicing events. In addition, associating the PEPMAN prediction results with different epigenetic features can help reveal important factors related to the transcription elongation process. All these results demonstrated that PEPMAN can provide a useful and effective tool for modeling transcription elongation and understanding the related biological factors from available high-throughput sequencing data.
RNA 聚合酶 II(Pol II)通常在基因体的某些位置暂停,从而中断转录延伸过程,这通常与各种重要的生物学功能相关,如前体 mRNA 的剪接和基因表达调控。因此,描述转录延伸动力学可以帮助我们理解真核细胞中的许多基本生物学过程。然而,实验测量 Pol II 延伸率通常既耗时又耗资源。我们开发了 PEPMAN(基于注意力的深度学习神经网络的 Pol II 延伸暂停建模),这是一种基于 native elongating transcript sequencing(NET-seq)数据的、可以准确预测 Pol II 暂停位点的深度学习模型。通过充分利用注意力机制,PEPMAN 能够破译 Pol II 暂停背后的重要序列特征。更重要的是,我们证明了对各种类型的可变剪接位点周围的 PEPMAN 预测结果进行分析可以为理解共转录剪接事件提供有用的线索。此外,将 PEPMAN 预测结果与不同的表观遗传特征相关联有助于揭示与转录延伸过程相关的重要因素。所有这些结果表明,PEPMAN 可以为从现有高通量测序数据中模拟转录延伸和理解相关生物学因素提供有用且有效的工具。