Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada.
BMC Genomics. 2012 Apr 26;13:152. doi: 10.1186/1471-2164-13-152.
BACKGROUND: Epigenetic modifications, transcription factor (TF) availability and differences in chromatin folding influence how the genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are found to be associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. Here we used histone modification and chromatin-associated protein ChIP-Seq data sets in mouse embryonic stem (ES) cells as well as genomic features to identify functional enhancer regions. Using co-bound sites of OCT4, SOX2 and NANOG (co-OSN, validated enhancers) and co-bound sites of MYC and MYCN (limited enhancer activity) as enhancer positive and negative training sets, we performed multinomial logistic regression with LASSO regularization to identify key features. RESULTS: Cross validations reveal that a combination of p300, H3K4me1, MED12 and NIPBL features to be top signatures of co-OSN regions. Using a model from 10 signatures, 83% of top 1277 putative 1 kb enhancer regions (probability greater than or equal to 0.8) overlapped with at least one TF peak from 7 mouse ES cell ChIP-Seq data sets. These putative enhancers are associated with increased gene expression of neighbouring genes and significantly enriched in multiple TF bound loci in agreement with combinatorial models of TF binding. Furthermore, we identified several motifs of known TFs significantly enriched in putative enhancer regions compared to random promoter regions and background. Comparison with an active H3K27ac mark in various cell types confirmed cell type-specificity of these enhancers. CONCLUSIONS: The top enhancer signatures we identified (p300, H3K4me1, MED12 and NIPBL) will allow for the identification of cell type-specific enhancer regions in diverse cell types.
背景:表观遗传修饰、转录因子 (TF) 的可及性以及染色质折叠的差异会影响转录机器对负责基因表达的基因组的解读。发现在不同细胞类型之间,位于非编码区域的增强子与组蛋白标记之间存在显著差异。相比之下,基因启动子在不同细胞类型中显示出更均匀的修饰。在这里,我们使用了小鼠胚胎干细胞 (ES) 细胞中的组蛋白修饰和染色质相关蛋白 ChIP-Seq 数据集以及基因组特征来识别功能增强子区域。使用 OCT4、SOX2 和 NANOG 的共结合位点(经验证的增强子)和 MYC 和 MYCN 的共结合位点(有限的增强子活性)作为增强子阳性和阴性训练集,我们使用多项逻辑回归和 LASSO 正则化来识别关键特征。
结果:交叉验证表明,p300、H3K4me1、MED12 和 NIPBL 特征的组合是共 OSN 区域的顶级特征。使用来自 10 个特征的模型,前 1277 个假定的 1 kb 增强子区域(概率大于或等于 0.8)的 83%至少与来自 7 个小鼠 ES 细胞 ChIP-Seq 数据集的一个 TF 峰重叠。这些假定的增强子与相邻基因的表达增加相关,并且与 TF 结合的组合模型一致,在多个 TF 结合位点显著富集。此外,与各种细胞类型中的活性 H3K27ac 标记相比,我们在假定的增强子区域中鉴定出了几个已知 TF 的基序,这些基序显著富集。
结论:我们鉴定的顶级增强子特征(p300、H3K4me1、MED12 和 NIPBL)将允许在不同的细胞类型中识别细胞特异性增强子区域。
Proc Natl Acad Sci U S A. 2008-12-16
G3 (Bethesda). 2025-4-17
Results Probl Cell Differ. 2024
Nucleic Acids Res. 2022-3-21
Proc Natl Acad Sci U S A. 2011-3-17
Nat Rev Genet. 2011-3-1
Curr Opin Genet Dev. 2011-2-15
Proc Natl Acad Sci U S A. 2010-11-24