Zhou Qin, Alberto de la Paz Jose, Stanowick Alexander D, Lin Xingcheng, Morcos Faruck
Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, United States.
Department of Physics, North Carolina State University, Raleigh, NC 27695, United States.
Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf592.
DNA-transcription factor (TF) interactions are essential for gene regulation. Fully characterizing TF recognition specificities and identifying their genomic binding targets are important to understand TF function and regulatory networks. Recently, high-throughput sequencing technology HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) has been used to measure hundreds of TFs, providing massive datasets that comprise TF binding preferences. However, there is a need to develop comprehensive computational modeling to fully extract and characterize critical TF binding preferences and fail to distinguish genome-wide binding targets. In this study, we developed a global pairwise model called DCA-Scapes trained with experimental HT-SELEX data. Our approach uncovered high-resolution TF recognition specificity landscapes, enabled the prediction of in vivo binding sequences, and was validated with ChIP-seq (ChIP sequencing) data. In addition, the DCA-Scapes model was utilized to refine the locations of binding regions and accurately identify the binding sites within the ChIP-seq enriched peaks. Moreover, we extended our model to cover the entire human genome, uncovering potential TF target sites that exhibit tissue-specific TF recognition across various cellular environments.
DNA转录因子(TF)相互作用对于基因调控至关重要。全面表征TF识别特异性并确定其基因组结合靶点对于理解TF功能和调控网络很重要。最近,高通量测序技术HT-SELEX(通过指数富集进行配体的高通量系统进化)已被用于测量数百种TF,提供了包含TF结合偏好的大量数据集。然而,需要开发全面的计算模型来充分提取和表征关键的TF结合偏好,并未能区分全基因组结合靶点。在本研究中,我们开发了一种名为DCA-Scapes的全局成对模型,并用实验性HT-SELEX数据进行训练。我们的方法揭示了高分辨率的TF识别特异性图谱,能够预测体内结合序列,并用ChIP-seq(染色质免疫沉淀测序)数据进行了验证。此外,DCA-Scapes模型被用于优化结合区域的位置,并准确识别ChIP-seq富集峰内的结合位点。此外,我们将模型扩展到覆盖整个人类基因组,揭示了在各种细胞环境中表现出组织特异性TF识别的潜在TF靶位点。