Chai Lu, Gao Jie, Li Zihan, Sun Hao, Liu Junjie, Wang Yong, Zhang Lirong
School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China.
CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, People's Republic of China.
Sci Rep. 2024 Dec 30;14(1):31744. doi: 10.1038/s41598-024-82238-5.
The CCCTC-binding factor (CTCF) is pivotal in orchestrating diverse biological functions across the human genome, yet the mechanisms driving its cell type-active DNA binding affinity remain underexplored. Here, we collected ChIP-seq data from 67 cell lines in ENCODE, constructed a unique dataset of cell type-active CTCF binding sites (CBS), and trained convolutional neural networks (CNN) to dissect the patterns of CTCF binding activity. Our analysis reveals that transcription factors RAD21/SMC3 and chromatin accessibility are more predictive compared to sequence motifs and histone modifications. Integrating them together achieved AUPRC values consistently above 0.868, highlighting their utility in deciphering CTCF transcription factor binding dynamics. This study provides a deeper understanding of the regulatory functions of CTCF via machine learning framework.
CCCTC结合因子(CTCF)在协调人类基因组中的多种生物学功能方面起着关键作用,但其驱动细胞类型活性DNA结合亲和力的机制仍未得到充分探索。在这里,我们从ENCODE的67个细胞系中收集了ChIP-seq数据,构建了一个独特的细胞类型活性CTCF结合位点(CBS)数据集,并训练卷积神经网络(CNN)来剖析CTCF结合活性的模式。我们的分析表明,与序列基序和组蛋白修饰相比,转录因子RAD21/SMC3和染色质可及性具有更强的预测能力。将它们整合在一起,AUPRC值始终高于0.868,突出了它们在解读CTCF转录因子结合动力学方面的效用。本研究通过机器学习框架对CTCF的调控功能有了更深入的理解。