Program in Bioinformatics and Integrative Biology, UMass Medical School, Worcester, MA, USA.
Nucleic Acids Res. 2021 Jun 4;49(10):5705-5725. doi: 10.1093/nar/gkab345.
Gene expression is controlled by regulatory elements within accessible chromatin. Although most regulatory elements are cell type-specific, a subset is accessible in nearly all the 517 human and 94 mouse cell and tissue types assayed by the ENCODE consortium. We systematically analyzed 9000 human and 8000 mouse ubiquitously-accessible candidate cis-regulatory elements (cCREs) with promoter-like signatures (PLSs) from ENCODE, which we denote ubi-PLSs. These are more CpG-rich than non-ubi-PLSs and correspond to genes with ubiquitously high transcription, including a majority of cell-essential genes. ubi-PLSs are enriched with motifs of ubiquitously-expressed transcription factors and preferentially bound by transcriptional cofactors regulating ubiquitously-expressed genes. They are highly conserved between human and mouse at the synteny level but exhibit frequent turnover of motif sites; accordingly, ubi-PLSs show increased variation at their centers compared with flanking regions among the ∼186 thousand human genomes sequenced by the TOPMed project. Finally, ubi-PLSs are enriched in genes implicated in Mendelian diseases, especially diseases broadly impacting most cell types, such as deficiencies in mitochondrial functions. Thus, a set of roughly 9000 mammalian promoters are actively maintained in an accessible state across cell types by a distinct set of transcription factors and cofactors to ensure the transcriptional programs of cell-essential genes.
基因表达受可及染色质内的调节元件控制。尽管大多数调节元件具有细胞类型特异性,但 ENCODE 联盟测定的 517 个人类和 94 种小鼠细胞和组织类型中,有一部分几乎所有的调节元件都可接近。我们系统地分析了 9000 个人类和 8000 个小鼠普遍可接近的候选顺式调控元件(cCRE),这些元件具有 ENCODE 中的启动子样特征(PLS),我们将其表示为 ubi-PLS。与非 ubi-PLS 相比,它们富含 CpG 且对应具有普遍高转录的基因,包括大多数细胞必需基因。 ubi-PLS 富含普遍表达的转录因子的基序,并且优先与调节普遍表达基因的转录共因子结合。它们在人类和小鼠之间在同线性水平上高度保守,但在 motif 位点上经常发生更替;因此,与侧翼区域相比,在由 TOPMed 项目测序的约 186000 个人类基因组中, ubi-PLS 中心的变异增加。最后, ubi-PLS 在涉及孟德尔疾病的基因中富集,特别是广泛影响大多数细胞类型的疾病,例如线粒体功能缺陷。因此,大约 9000 个哺乳动物启动子通过一组独特的转录因子和共因子保持在可接近状态,以确保细胞必需基因的转录程序。