Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
Genome Biol. 2012 Jun 13;13(9):R53. doi: 10.1186/gb-2012-13-9-r53.
Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.
We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.
Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
先前的研究已经表明染色质特征水平与基因表达相关。ENCODE 项目使我们能够使用前所未有的大量数据进一步探索这种关系。使用多种高通量技术,从几种人类细胞系的不同细胞区室中提取的 RNA 进行了超过 100,000 个启动子的表达水平测量。ENCODE 还在七个细胞系中生成了十一个组蛋白标记、一个组蛋白变体和 DNase I 超敏位点的全基因组图谱。
我们构建了一个新的定量模型来研究染色质特征与表达水平之间的关系。我们的研究不仅证实了先前研究中发现的一般关系在各种细胞系中都适用,而且还对染色质特征与基因表达水平之间的关系提出了新的建议。我们发现,不同的染色质特征组可以高精度地预测表达状态和表达水平。我们还发现,CAGE 测量的表达水平比 RNA-PET 或 RNA-Seq 更能预测,并且不同类别的染色质特征对不同的 RNA 测量方法的表达最具预测性。此外,与不同细胞区室中的 PolyA- RNA 相比,PolyA+ RNA 总体上更具可预测性,并且使用 RNA-Seq 测量的 PolyA+胞质 RNA 比 PolyA+核 RNA 更具可预测性,而 PolyA- RNA 则相反。
通过在不同的细胞环境中分析染色质特征,我们的研究为转录调控提供了新的见解。