Department of Computer Science, School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China.
Department of Computer Science, The University of British Columbia Okanagan, Kelowna, BC V1V 1V5, Canada.
Bioinformatics. 2021 Dec 11;37(24):4635-4642. doi: 10.1093/bioinformatics/btab534.
CTCF-mediated chromatin loops underlie the formation of topological associating domains and serve as the structural basis for transcriptional regulation. However, the formation mechanism of these loops remains unclear, and the genome-wide mapping of these loops is costly and difficult. Motivated by the recent studies on the formation mechanism of CTCF-mediated loops, we studied the possibility of making use of transitivity-related information of interacting CTCF anchors to predict CTCF loops computationally. In this context, transitivity arises when two CTCF anchors interact with the same third anchor by the loop extrusion mechanism and bring themselves close to each other spatially to form an indirect loop.
To determine whether transitivity is informative for predicting CTCF loops and to obtain an accurate and low-cost predicting method, we proposed a two-stage random-forest-based machine learning method, CTCF-mediated Chromatin Interaction Prediction (CCIP), to predict CTCF-mediated chromatin loops. Our two-stage learning approach makes it possible for us to train a prediction model by taking advantage of transitivity-related information as well as functional genome data and genomic data. Experimental studies showed that our method predicts CTCF-mediated loops more accurately than other methods and that transitivity, when used as a properly defined attribute, is informative for predicting CTCF loops. Furthermore, we found that transitivity explains the formation of tandem CTCF loops and facilitates enhancer-promoter interactions. Our work contributes to the understanding of the formation mechanism and function of CTCF-mediated chromatin loops.
The source code of CCIP can be accessed at: https://github.com/GaoLabXDU/CCIP.
Supplementary data are available at Bioinformatics online.
CTCF 介导的染色质环是拓扑关联域形成的基础,并作为转录调控的结构基础。然而,这些环的形成机制仍不清楚,并且这些环的全基因组映射既昂贵又困难。受最近关于 CTCF 介导的环形成机制的研究的启发,我们研究了是否有可能利用相互作用的 CTCF 锚点的传递性关系信息来计算预测 CTCF 环。在这种情况下,当两个 CTCF 锚点通过环挤压机制与相同的第三个锚点相互作用,并将自身在空间上彼此靠近以形成间接环时,就会出现传递性。
为了确定传递性是否为预测 CTCF 环提供信息,并获得准确且低成本的预测方法,我们提出了一种基于两阶段随机森林的机器学习方法 CTCF 介导染色质相互作用预测 (CCIP) 来预测 CTCF 介导的染色质环。我们的两阶段学习方法使我们能够通过利用传递性关系信息以及功能基因组数据和基因组数据来训练预测模型。实验研究表明,我们的方法比其他方法更准确地预测 CTCF 介导的环,并且当用作适当定义的属性时,传递性对预测 CTCF 环具有信息性。此外,我们发现传递性解释了串联 CTCF 环的形成,并促进了增强子-启动子相互作用。我们的工作有助于理解 CTCF 介导的染色质环的形成机制和功能。
CCIP 的源代码可在 https://github.com/GaoLabXDU/CCIP 上访问。
补充数据可在生物信息学在线获得。