Honkela Antti, Peltonen Jaakko, Topa Hande, Charapitsa Iryna, Matarese Filomena, Grote Korbinian, Stunnenberg Hendrik G, Reid George, Lawrence Neil D, Rattray Magnus
Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, 00014 Helsinki, Finland;
Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, 00076 Espoo, Finland; School of Information Sciences, University of Tampere, 33014 Tampere, Finland;
Proc Natl Acad Sci U S A. 2015 Oct 20;112(42):13115-20. doi: 10.1073/pnas.1420404112. Epub 2015 Oct 5.
Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.
具有相似转录激活动力学的基因可能会由于转录时间、降解速率和RNA加工动力学的差异而呈现出非常不同的mRNA时间谱。最近的研究表明,与剪接相关的RNA产生延迟可能很显著。为了更全面地研究这个问题,开发适用于全基因组数据集的方法是很有用的。我们引入了一个转录激活和mRNA积累的联合模型,该模型可用于根据高通量测序时间进程实验的数据推断转录速率、RNA产生延迟和降解速率。我们将一个机械微分方程模型与一种非参数统计建模方法相结合,使我们能够捕捉广泛的激活动力学,并使用贝叶斯参数估计来量化动力学参数估计中的不确定性。我们将该模型应用于MCF-7乳腺癌细胞系中雌激素受体α激活的数据。我们使用RNA聚合酶II ChIP-Seq时间进程数据来表征转录激活,并使用mRNA-Seq时间进程数据来量化成熟转录本。我们发现,数据中信号良好的基因中有11%在完成转录和产生成熟mRNA之间存在超过20分钟的延迟。显示这些长延迟的基因更有可能是短基因。我们还在pre-mRNA数据中发现了高延迟与晚期内含子保留之间的统计关联,这表明许多基因存在显著的与剪接相关的产生延迟。