Midha Tripti, Kolomeisky Anatoly B, Igoshin Oleg A
Center for Theoretical Biological Physics, Rice University, Houston, TX 77005.
Department of Chemistry, Rice University, Houston, TX 77005.
Proc Natl Acad Sci U S A. 2025 Jul 15;122(28):e2505040122. doi: 10.1073/pnas.2505040122. Epub 2025 Jul 9.
The fidelity of template-dependent mRNA synthesis during transcription elongation is the primary determinant of accurate gene expression and the maintenance of functional RNA transcripts. However, the mechanisms governing transcription fidelity remain incompletely understood. While previous studies have characterized how error rates vary with nucleotide identity at upstream and downstream positions from the incorporation site, the comprehensive microscopic explanation of this sequence dependence has not been elucidated. In this study, we develop a theoretical approach that integrates transcription proofreading mechanisms and inhomogeneous DNA sequence effects. Using first-passage analysis validated by Monte Carlo simulations, we quantitatively characterize nucleotide-specific error rates during RNA polymerase II transcription. The model accurately reproduces experimental error rates and predicts kinetic parameters influencing transcriptional fidelity. Analysis reveals nucleotide incorporation rates follow the hierarchy U<C<G<A, consistent with independent experimental observations. Notably, our model not only explains how the error rates depend on the nature of the base immediately downstream (+1) but also predicts that the identity of the nucleotide at the second downstream position (+2) also plays an important role. Pyrimidines at position +2 contribute to lower error rates than purines, whereas the third downstream base (+3) has no effect. These previously unreported correlations are corroborated by bioinformatic analysis of existing datasets. In addition, using the BRCA1 gene as an example, we explore the physiological implications of sequence-dependent error rates, identifying an increased likelihood of premature stop codon errors. These findings clarify how DNA sequence context modulates nucleotide incorporation kinetics, advancing our understanding of transcriptional fidelity and its functional consequences.
转录延伸过程中依赖模板的mRNA合成的保真度是准确基因表达和功能性RNA转录本维持的主要决定因素。然而,控制转录保真度的机制仍未完全理解。虽然先前的研究已经描述了错误率如何随掺入位点上游和下游位置的核苷酸身份而变化,但尚未阐明这种序列依赖性的全面微观解释。在本研究中,我们开发了一种整合转录校对机制和非均匀DNA序列效应的理论方法。使用经蒙特卡罗模拟验证的首过分析,我们定量表征了RNA聚合酶II转录过程中核苷酸特异性错误率。该模型准确再现了实验错误率,并预测了影响转录保真度的动力学参数。分析表明核苷酸掺入率遵循U<C<G<A的等级顺序,这与独立的实验观察结果一致。值得注意的是,我们的模型不仅解释了错误率如何取决于紧邻下游(+1)碱基的性质,还预测了下游第二个位置(+2)的核苷酸身份也起着重要作用。+2位置的嘧啶比嘌呤导致更低的错误率,而下游第三个碱基(+3)没有影响。现有数据集的生物信息学分析证实了这些先前未报道的相关性。此外,以BRCA1基因为例,我们探讨了序列依赖性错误率的生理意义,发现提前终止密码子错误的可能性增加。这些发现阐明了DNA序列背景如何调节核苷酸掺入动力学,推进了我们对转录保真度及其功能后果的理解。