Suppr超能文献

转录延伸控制的预测模型从染色质特征中识别转录调控因子。

Predictive model of transcriptional elongation control identifies trans regulatory factors from chromatin signatures.

机构信息

Institute of Computational Biology, Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.

Department of Computer Science, TUM School of Computation, Information and Technology, Technical University Munich, Munich, Germany.

出版信息

Nucleic Acids Res. 2023 Feb 28;51(4):1608-1624. doi: 10.1093/nar/gkac1272.

Abstract

Promoter-proximal Polymerase II (Pol II) pausing is a key rate-limiting step for gene expression. DNA and RNA-binding trans-acting factors regulating the extent of pausing have been identified. However, we lack a quantitative model of how interactions of these factors determine pausing, therefore the relative importance of implicated factors is unknown. Moreover, previously unknown regulators might exist. Here we address this gap with a machine learning model that accurately predicts the extent of promoter-proximal Pol II pausing from large-scale genome and transcriptome binding maps and gene annotation and sequence composition features. We demonstrate high accuracy and generalizability of the model by validation on an independent cell line which reveals the model's cell line agnostic character. Model interpretation in light of prior knowledge about molecular functions of regulatory factors confirms the interconnection of pausing with other RNA processing steps. Harnessing underlying feature contributions, we assess the relative importance of each factor, quantify their predictive effects and systematically identify previously unknown regulators of pausing. We additionally identify 16 previously unknown 7SK ncRNA interacting RNA-binding proteins predictive of pausing. Our work provides a framework to further our understanding of the regulation of the critical early steps in transcriptional elongation.

摘要

启动子近端聚合酶 II (Pol II) 暂停是基因表达的关键限速步骤。已经鉴定出调节暂停程度的 DNA 和 RNA 结合反式作用因子。然而,我们缺乏一个定量模型来了解这些因素的相互作用如何决定暂停,因此,所涉及的因素的相对重要性是未知的。此外,可能存在以前未知的调节剂。在这里,我们使用机器学习模型来解决这一差距,该模型可以根据大规模基因组和转录组结合图谱以及基因注释和序列组成特征,准确预测启动子近端 Pol II 暂停的程度。我们通过在独立细胞系上进行验证,证明了该模型的高精度和通用性,这揭示了该模型的细胞系不可知特性。根据关于调节因子分子功能的先验知识对模型进行解释,证实了暂停与其他 RNA 加工步骤的相互联系。利用潜在特征的贡献,我们评估了每个因素的相对重要性,量化了它们的预测效果,并系统地识别了以前未知的暂停调节剂。我们还鉴定了 16 个以前未知的 7SK ncRNA 相互作用的 RNA 结合蛋白,它们可以预测暂停。我们的工作为进一步了解转录延伸的关键早期步骤的调控提供了一个框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7b9/9976927/021268979068/gkac1272figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验