Hudaiberdiev Sanjarbek, Ovcharenko Ivan
National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD.
bioRxiv. 2024 Aug 4:2023.02.05.527203. doi: 10.1101/2023.02.05.527203.
Enhancers and promoters are classically considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1,003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected 5 distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
增强子和启动子传统上被认为是以序列特异性方式由一小部分转录因子(TFs)结合。随着TFs的染色质免疫沉淀测序(ChIP-seq)分析数据集的扩大,这一假设受到越来越多的质疑。特别是,高占据靶点(HOT)位点吸引了数百种TFs,而ChIP-seq峰与DNA结合基序的存在之间往往没有可检测到的相关性。在这里,我们使用了一组1003个TF ChIP-seq数据集(HepG2、K562、H1),结合功能基因组学数据集来分析ChIP-seq峰共出现的模式。我们在启动子(53%)和增强子(47%)区域鉴定出43,891个HOT位点。HOT启动子调控管家基因,而HOT增强子参与组织特异性过程调控。HOT位点构成了人类超级增强子的基础,并在强烈的负选择下进化,其中一些位点位于超保守区域。基于序列的HOT位点分类分析表明,它们的形成是由序列特征驱动的,跨TF结合位点的映射ChIP-seq峰密度与序列特征和侧翼基因的表达水平相关。基于与启动子和增强子结合的亲和力,我们检测到形成HOT位点核心的5个不同的TF簇。我们报告了人类基因组中大量的HOT位点,以及所有TF ChIP-seq结合事件中有51%参与HOT位点形成,从而挑战了增强子活性的经典模型,并基于大型转录凝聚物的存在提出了一个HOT位点形成模型。