Department of Statistical Science, The Graduate University for Advanced Studies (SOKENDAI), Tachikawa, Japan.
Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Japan.
Bioinformatics. 2019 Jun 1;35(11):1877-1884. doi: 10.1093/bioinformatics/bty886.
Sequencing total RNA without poly-A selection enables us to obtain a transcriptomic profile of nascent RNAs undergoing transcription with co-transcriptional splicing. In general, the RNA-seq reads exhibit a sawtooth pattern in a gene, which is characterized by a monotonically decreasing gradient across introns in the 5'-3' direction, and by substantially higher levels of RNA-seq reads present in exonic regions. Such patterns result from the process of underlying transcription elongation by RNA polymerase II, which traverses the DNA strand in a 5'-3' direction as it performs a complex series of mRNA synthesis and processing. Therefore, data of sequenced total RNAs could be utilized to infer the rate of transcription elongation by solving the inverse problem.
Though solving the inverse problem in total RNA-seq has the great potential, statistical methods have not yet been fully developed. We demonstrate what extent the newly developed method can be useful. The objective is to reconstruct the spatial distribution of transcription elongation rates in a gene from a given noisy, sawtooth-like profile. It is necessary to recover the signal source of the elongation rates separately from several types of nuisance factors, such as unobserved modes of co-transcriptionally occurring mRNA splicing, which exert significant influences on the sawtooth shape. The present method was tested using published total RNA-seq data derived from mouse embryonic stem cells. We investigated the spatial characteristics of the estimated elongation rates, focusing especially on the relation to promoter-proximal pausing of RNA polymerase II, nucleosome occupancy and histone modification patterns.
A C implementation of PolSter and sample data are available at https://github.com/yoshida-lab/PolSter.
Supplementary data are available at Bioinformatics online.
对总 RNA 进行非 poly-A 选择的测序使我们能够获得正在转录的新生 RNA 的转录组谱,同时进行共转录剪接。一般来说,RNA-seq reads 在基因中表现出锯齿状模式,其特征是在 5'到 3'方向上穿过内含子的梯度单调下降,并且在外显子区域中存在的 RNA-seq reads 水平显著更高。这种模式是由 RNA 聚合酶 II 进行的基础转录延伸过程产生的,它在执行一系列复杂的 mRNA 合成和加工过程中沿着 DNA 链从 5'到 3'方向移动。因此,测序总 RNA 的数据可用于通过解决反问题来推断转录延伸率。
尽管在总 RNA-seq 中解决反问题具有很大的潜力,但统计方法尚未得到充分发展。我们展示了新开发的方法可以在多大程度上有用。目标是从给定的嘈杂锯齿状图谱中重建基因中转录延伸率的空间分布。有必要将延伸率的信号源与几种类型的干扰因素(如共转录发生的 mRNA 剪接的未观察到的模式)分开恢复,这些干扰因素对锯齿形状有重大影响。该方法使用来自小鼠胚胎干细胞的已发表的总 RNA-seq 数据进行了测试。我们研究了估计延伸率的空间特征,特别关注其与 RNA 聚合酶 II 启动子近端暂停、核小体占有率和组蛋白修饰模式的关系。
PolSter 的 C 实现和示例数据可在 https://github.com/yoshida-lab/PolSter 上获得。
补充数据可在 Bioinformatics 在线获得。