Department of Biology, Stanford University, Stanford, CA 94305, USA.
Proc Natl Acad Sci U S A. 2009 Dec 22;106(51):21521-6. doi: 10.1073/pnas.0904863106. Epub 2009 Dec 7.
Next-generation sequencing has greatly increased the scope and the resolution of transcriptional regulation study. RNA sequencing (RNA-Seq) and ChIP-Seq experiments are now generating comprehensive data on transcript abundance and on regulator-DNA interactions. We propose an approach for an integrated analysis of these data based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection. Compared with traditional methods, our approach not only offers higher power in predicting gene expression from ChIP-Seq data but also provides a way to capture cooperation among regulators. In mouse embryonic stem cells (ESCs), we find that a remarkably high proportion of variation in gene expression (65%) can be explained by the binding signals of 12 transcription factors (TFs). Two groups of TFs are identified. Whereas the first group (E2f1, Myc, Mycn, and Zfx) act as activators in general, the second group (Oct4, Nanog, Sox2, Smad1, Stat3, Tcfcp2l1, and Esrrb) may serve as either activator or repressor depending on the target. The two groups of TFs cooperate tightly to activate genes that are differentially up-regulated in ESCs. In the absence of binding by the first group, the binding of the second group is associated with genes that are repressed in ESCs and derepressed upon early differentiation.
下一代测序技术极大地扩展了转录调控研究的范围和分辨率。RNA 测序 (RNA-Seq) 和 ChIP-Seq 实验现在正在生成关于转录物丰度和调控因子-DNA 相互作用的综合数据。我们提出了一种基于 ChIP-Seq 信号特征提取、主成分分析和基于回归的组件选择的综合分析这些数据的方法。与传统方法相比,我们的方法不仅提供了从 ChIP-Seq 数据预测基因表达的更高能力,而且还提供了一种捕获调控因子之间合作的方法。在小鼠胚胎干细胞 (ESC) 中,我们发现基因表达变化的很大一部分(65%)可以用 12 个转录因子 (TF) 的结合信号来解释。我们发现两组 TF。第一组(E2f1、Myc、Mycn 和 Zfx)通常作为激活剂起作用,而第二组(Oct4、Nanog、Sox2、Smad1、Stat3、Tcfcp2l1 和 Esrrb)可能根据靶基因既作为激活剂也作为抑制剂起作用。两组 TF 紧密合作,激活在 ESC 中差异上调的基因。在第一组没有结合的情况下,第二组的结合与在 ESC 中被抑制且在早期分化时去抑制的基因相关。