Departments of Computer Science, Stanford University, 240 Pasteur Drive, Stanford, California, 94305, USA.
Present address: Department of Computational Biology, Carnegie Mellon University, 5000 Forbes Avenue, Gates-Hillman Building Room 7703, Pittsburgh, PA, 15213, USA.
BMC Genomics. 2022 Apr 12;23(1):295. doi: 10.1186/s12864-022-08486-9.
Many transcription factors (TFs), such as multi zinc-finger (ZF) TFs, have multiple DNA binding domains (DBDs), and deciphering the DNA binding motifs of individual DBDs is a major challenge. One example of such a TF is CCCTC-binding factor (CTCF), a TF with eleven ZFs that plays a variety of roles in transcriptional regulation, most notably anchoring DNA loops. Previous studies found that CTCF ZFs 3-7 bind CTCF's core motif and ZFs 9-11 bind a specific upstream motif, but the motifs of ZFs 1-2 have yet to be identified.
We developed a new approach to identifying the binding motifs of individual DBDs of a TF through analyzing chromatin immunoprecipitation sequencing (ChIP-seq) experiments in which a single DBD is mutated: we train a deep convolutional neural network to predict whether wild-type TF binding sites are preserved in the mutant TF dataset and interpret the model. We applied this approach to mouse CTCF ChIP-seq data and identified the known binding preferences of CTCF ZFs 3-11 as well as a putative GAG binding motif for ZF 1. We analyzed other CTCF datasets to provide additional evidence that ZF 1 is associated with binding at the motif we identified, and we found that the presence of the motif for ZF 1 is associated with CTCF ChIP-seq peak strength.
Our approach can be applied to any TF for which in vivo binding data from both the wild-type and mutated versions of the TF are available, and our findings provide new potential insights binding preferences of CTCF's DBDs.
许多转录因子(TFs),如多锌指(ZF)TFs,具有多个 DNA 结合域(DBD),破译单个 DBD 的 DNA 结合基序是一个主要挑战。此类 TF 的一个例子是 CCCTC 结合因子(CTCF),它是一个具有 11 个 ZF 的 TF,在转录调控中发挥多种作用,尤其是锚定 DNA 环。先前的研究发现,CTCF ZF 3-7 结合 CTCF 的核心基序,ZF 9-11 结合特定的上游基序,但 ZF 1-2 的基序尚未确定。
我们开发了一种通过分析单个 DBD 突变的染色质免疫沉淀测序(ChIP-seq)实验来识别 TF 中单个 DBD 结合基序的新方法:我们训练一个深度卷积神经网络来预测野生型 TF 结合位点是否在突变 TF 数据集中保留,并解释模型。我们将此方法应用于小鼠 CTCF ChIP-seq 数据,并确定了 CTCF ZF 3-11 的已知结合偏好以及 ZF 1 的假定 GAG 结合基序。我们分析了其他 CTCF 数据集,以提供更多证据表明 ZF 1 与我们鉴定的基序相关,并且我们发现 ZF 1 基序的存在与 CTCF ChIP-seq 峰强度相关。
我们的方法可以应用于任何具有野生型和突变型 TF 体内结合数据的 TF,我们的研究结果为 CTCF 的 DBD 结合偏好提供了新的潜在见解。