Human Genome Center, Institute of Medical Science, University of Tokyo, Japan.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S50. doi: 10.1186/1471-2105-12-S1-S50.
To understand the gene regulatory system that governs the self-renewal and pluripotency of embryonic stem cells (ESCs) is an important step for promoting regenerative medicine. In it, the role of several core transcription factors (TFs), such as Oct4, Sox2 and Nanog, has been intensively investigated, details of their involvement in the genome-wide gene regulation are still not well clarified.
We constructed a predictive model of genome-wide gene expression in mouse ESCs from publicly available ChIP-seq data of 12 core TFs. The tag sequences were remapped on the genome by various alignment tools. Then, the binding density of each TF is calculated from the genome-wide bona fide TF binding sites. The TF-binding data was combined with the data of several epigenetic states (DNA methylation, several histone modifications, and CpG island) of promoter regions. These data as well as the ordinary peak intensity data were used as predictors of a simple linear regression model that predicts absolute gene expression. We also developed a pipeline for analyzing the effects of predictors and their interactions.
Through our analysis, we identified two classes of genes that are either well explained or inefficiently explained by our model. The latter class seems to be genes that are not directly regulated by the core TFs. The regulatory regions of these gene classes show apparently distinct patterns of DNA methylation, histone modifications, existence of CpG islands, and gene ontology terms, suggesting the relative importance of epigenetic effects. Furthermore, we identified statistically significant TF interactions correlated with the epigenetic modification patterns.
Here, we proposed an improved prediction method in explaining the ESC-specific gene expression. Our study implies that the majority of genes are more or less directly regulated by the core TFs. In addition, our result is consistent with the general idea of relative importance of epigenetic effects in ESCs.
理解调控胚胎干细胞(ESC)自我更新和多能性的基因调控系统,是促进再生医学发展的重要步骤。在这一过程中,已深入研究了几个核心转录因子(TF)的作用,如 Oct4、Sox2 和 Nanog,但它们在全基因组基因调控中的具体作用仍未得到充分阐明。
我们根据 12 个核心 TF 的公开 ChIP-seq 数据,构建了一个预测小鼠 ESC 全基因组基因表达的模型。通过各种比对工具,将标签序列重新映射到基因组上。然后,从全基因组的 bona fide TF 结合位点计算每个 TF 的结合密度。将 TF 结合数据与启动子区域的几种表观遗传状态(DNA 甲基化、几种组蛋白修饰和 CpG 岛)的数据相结合。这些数据以及普通的峰强度数据被用作简单线性回归模型的预测因子,该模型可以预测基因的绝对表达水平。我们还开发了一个分析预测因子及其相互作用影响的分析流程。
通过我们的分析,我们确定了两类基因,一类是我们的模型能够很好解释的基因,另一类是模型不能很好解释的基因。后者似乎是不受核心 TF 直接调控的基因。这两类基因的调控区域显示出明显不同的 DNA 甲基化、组蛋白修饰、CpG 岛存在和基因本体术语模式,这表明了表观遗传效应的相对重要性。此外,我们还发现了与表观遗传修饰模式相关的、具有统计学意义的 TF 相互作用。
我们提出了一种改进的预测方法来解释 ESC 特异性基因表达。我们的研究表明,大多数基因或多或少都受到核心 TF 的直接调控。此外,我们的结果与在 ESC 中表观遗传效应相对重要性的普遍观点是一致的。