Suppr超能文献

染色质特征和转录因子结合为理解植物基因表达提供了预测基础。

Chromatin Signature and Transcription Factor Binding Provide a Predictive Basis for Understanding Plant Gene Expression.

机构信息

College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China.

出版信息

Plant Cell Physiol. 2019 Jul 1;60(7):1471-1486. doi: 10.1093/pcp/pcz051.

Abstract

Chromatin accessibility and post-transcriptional histone modifications play important roles in gene expression regulation. However, little is known about the joint effect of multiple chromatin modifications on the gene expression level in plants, despite that the regulatory roles of individual histone marks such as H3K4me3 in gene expression have been well-documented. By using machine-learning methods, we systematically performed gene expression level prediction based on multiple chromatin modifications data in Arabidopsis and rice. We found that as few as four histone modifications were sufficient to yield good prediction performance, and H3K4me3 and H3K36me3 being the top two predictors with known functions related to transcriptional initiation and elongation, respectively. We demonstrated that the predictive powers differed between protein-coding and non-coding genes as well as between CpG-enriched and CpG-depleted genes. We also showed that the predictive model trained in one tissue or species could be applied to another tissue or species, suggesting shared underlying mechanisms. More interestingly, the gene expression levels of conserved orthologs are easier to predict than the species-specific genes. In addition, chromatin state of distal enhancers was moderately correlated to gene expression but was dispensable if given the chromatin features of the proximal regions of genes. We further extended the analysis to transcription factor (TF) binding data. Strikingly, the combinatorial effects of only a few TFs were roughly fit to gene expression levels in Arabidopsis. Overall, by using quantitative modeling, we provide a comprehensive and unbiased perspective on the epigenetic and TF-mediated regulation of gene expression in plants.

摘要

染色质可及性和转录后组蛋白修饰在基因表达调控中发挥着重要作用。然而,尽管单个组蛋白标记(如 H3K4me3)在基因表达中的调控作用已有充分的研究,但人们对多种染色质修饰共同作用对植物基因表达水平的影响知之甚少。我们使用机器学习方法,在拟南芥和水稻中系统地基于多个染色质修饰数据进行了基因表达水平预测。我们发现,仅使用四个组蛋白修饰就足以产生良好的预测性能,并且 H3K4me3 和 H3K36me3 分别作为转录起始和延伸相关的两个具有已知功能的前两个预测因子。我们证明了预测能力在蛋白质编码基因和非编码基因之间以及 CpG 富集基因和 CpG 缺失基因之间存在差异。我们还表明,在一个组织或物种中训练的预测模型可以应用于另一个组织或物种,这表明存在共享的潜在机制。更有趣的是,保守直系同源基因的表达水平比物种特异性基因更容易预测。此外,远端增强子的染色质状态与基因表达中度相关,但如果给定基因近端区域的染色质特征,则增强子的染色质状态是可有可无的。我们进一步将分析扩展到转录因子(TF)结合数据。引人注目的是,仅少数几个 TF 的组合效应大致符合拟南芥中的基因表达水平。总的来说,通过使用定量建模,我们为植物中基因表达的表观遗传和 TF 介导的调控提供了全面而无偏倚的视角。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验