Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China.
Beidahuang Industry Group General Hospital, Harbin, China.
Methods. 2024 Dec;232:96-106. doi: 10.1016/j.ymeth.2024.11.004. Epub 2024 Nov 7.
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (AUPRC≥0.93). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
染色质的三维结构对于基因表达的调控至关重要。YY1 以类似于 CTCF 介导的染色质相互作用的方式促进增强子-启动子相互作用。然而,人们对哪些 YY1 结合位点可以形成环锚知之甚少。在这项研究中,使用 LightGBM 模型通过整合多组学数据来预测 YY1-环锚。由于正例和负例数量的巨大不平衡,我们使用 AUPRC 来反映分类器的质量。结果表明,LightGBM 模型表现出很强的预测性能(AUPRC≥0.93)。为了验证模型的稳健性,将数据集按照 4:1 的比例分为训练集和测试集。结果表明,该模型在训练集和独立测试集上的 YY1-环锚预测性能都很好。此外,我们对特征的重要性进行了排名,发现 YY1-环锚的形成主要受到转录因子 CTCF、SMC3 和 RAD21 的共结合,以及组蛋白修饰和序列上下文的影响。