HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA.
Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.
Genome Res. 2020 Jul;30(7):939-950. doi: 10.1101/gr.260463.119. Epub 2020 Jul 2.
DNA-associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal data set of 352 nonredundant, in vitro-derived motifs mapped to the genome within DNase I hypersensitivity footprints to characterize regions with high numbers of DAP associations. We establish a generalizable definition for high occupancy target (HOT) loci and identify putative driver DAP motifs in HepG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and show sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity, and by systematically mutating 245 HOT loci with a massively parallel mutagenesis assay, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
DNA 相关蛋白(DAPs)通过与调控基因如增强子或启动子等结合,经典地调控基因表达。随着越来越多的全基因组 DAP 结合图谱目录揭示了数千个不同于大多数传统增强子和启动子的调控基因,这些基因与数十种不同的 DAP 结合,而对基序偏好的关注却很少,因此,了解这些调控基因中 DAP 的结合和协调对于破译这些区域如何促进正常发育和疾病至关重要。在这项研究中,我们整合了来自三种细胞系中 469 种人类 DAP 的公开 ChIP-seq 数据,并将这些数据与 352 个非冗余、体外衍生的基序的正交数据集进行了整合,这些基序在 DNA 酶 I 超敏反应足迹内映射到基因组,以表征具有高 DAP 结合数的区域。我们为高占有率靶(HOT)基因座定义了一个可推广的定义,并在 HepG2 细胞中鉴定出了高占有率靶(HOT)基因座中的推定驱动 DAP 基序,包括 HNF4A、SP1、SP5 和 ETV4,这些基序在 HOT 基因座中高度普遍存在,并显示出序列保守性。与一个元件结合的不同 DAP 的数量与调控活性的证据呈正相关,通过系统地用大规模平行突变分析检测 245 个 HOT 基因座,我们将调控活性定位到一个依赖于我们之前提名的驱动 DAP 基序的核心区域。总之,这项工作利用了越来越多的 DAP 基序和 ChIP-seq 数据,探索了 DAP 结合如何有助于全基因组转录调控。