Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada.
Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada.
Genome Res. 2021 Apr;31(4):564-575. doi: 10.1101/gr.272468.120. Epub 2021 Mar 12.
Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.
转录增强子对于发育和表型进化至关重要,并且在疾病情况下经常发生突变;然而,即使在研究充分的细胞类型中,赋予增强子活性的序列代码仍然未知。为了研究多能干细胞的增强子调控代码,我们在小鼠和人类胚胎干细胞(ESCs)中鉴定了具有多个转录因子保守结合的基因组区域。对这些区域的检查表明,它们平均包含 12.6 个保守转录因子结合位点(TFBS)序列。富含 TFBS 的是 70 个不同序列的多样化组合,代表了已知和新型 ESC 调节剂的结合序列。使用来自该组合的多种 TFBS 足以构建具有与天然增强子相当活性的短合成增强子。内源性增强子中保守 TFBS 的定点突变或合成序列中 TFBS 的缺失揭示了需要 10 个或更多不同的 TFBS。此外,尽管特定的 TFBS,包括 POUSF1:SOX2 共基序,与多能性的主要调节因子 POUSF1(也称为 OCT4)、SOX2 和 NANOG 共同结合,但它们是可有可无的。这些发现表明,TFBS 序列多样性阈值超过了对优化调控语法和招募特定主要调节因子的单个 TFBS 的需求。