Department of Data Sciences, Dana-Farber Cancer Institute. Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Biological and Biomedical Science Program, Harvard Medical School, Boston, MA, USA.
Nat Commun. 2020 May 18;11(1):2472. doi: 10.1038/s41467-020-16106-x.
Characterization of the genomic distances over which transcription factor (TF) binding influences gene expression is important for inferring target genes from TF chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Here we systematically examine the relationship between thousands of TF and histone modification ChIP-seq data sets with thousands of gene expression profiles. We develop a model for integrating these data, which reveals two classes of TFs with distinct ranges of regulatory influence, chromatin-binding preferences, and auto-regulatory properties. We find that the regulatory range of the same TF bound within different topologically associating domains (TADs) depend on intrinsic TAD properties such as local gene density and G/C content, but also on the TAD chromatin states. Our results suggest that considering TF type, binding distance to gene locus, as well as chromatin context is important in identifying implicated TFs from GWAS SNPs.
对转录因子(TF)结合如何影响基因表达的基因组距离进行特征描述,对于从 TF 染色质免疫沉淀 followed by sequencing(ChIP-seq)数据推断靶基因非常重要。在这里,我们系统地检查了数千个 TF 和组蛋白修饰 ChIP-seq 数据集与数千个基因表达谱之间的关系。我们开发了一种整合这些数据的模型,该模型揭示了具有不同调控影响范围、染色质结合偏好和自身调控特性的两类 TF。我们发现,在不同拓扑关联域(TAD)内结合的相同 TF 的调控范围取决于内在的 TAD 特性,如局部基因密度和 G/C 含量,但也取决于 TAD 染色质状态。我们的研究结果表明,考虑 TF 类型、与基因座的结合距离以及染色质背景,对于从 GWAS SNPs 中识别相关 TF 非常重要。