Centro Andaluz de Biología Molecular y Medicina Regenerativa (CABIMER), CSIC-Universidad de Sevilla-Universidad Pablo de Olavide, Seville, Spain.
Division of Computer Science, Universidad Pablo de Olavide, Seville, Spain.
PLoS Comput Biol. 2021 Jan 19;17(1):e1007814. doi: 10.1371/journal.pcbi.1007814. eCollection 2021 Jan.
DNA topoisomerase II-β (TOP2B) is fundamental to remove topological problems linked to DNA metabolism and 3D chromatin architecture, but its cut-and-reseal catalytic mechanism can accidentally cause DNA double-strand breaks (DSBs) that can seriously compromise genome integrity. Understanding the factors that determine the genome-wide distribution of TOP2B is therefore not only essential for a complete knowledge of genome dynamics and organization, but also for the implications of TOP2-induced DSBs in the origin of oncogenic translocations and other types of chromosomal rearrangements. Here, we conduct a machine-learning approach for the prediction of TOP2B binding using publicly available sequencing data. We achieve highly accurate predictions, with accessible chromatin and architectural factors being the most informative features. Strikingly, TOP2B is sufficiently explained by only three features: DNase I hypersensitivity, CTCF and cohesin binding, for which genome-wide data are widely available. Based on this, we develop a predictive model for TOP2B genome-wide binding that can be used across cell lines and species, and generate virtual probability tracks that accurately mirror experimental ChIP-seq data. Our results deepen our knowledge on how the accessibility and 3D organization of chromatin determine TOP2B function, and constitute a proof of principle regarding the in silico prediction of sequence-independent chromatin-binding factors.
DNA 拓扑异构酶 II-β(TOP2B)对于消除与 DNA 代谢和 3D 染色质结构相关的拓扑问题至关重要,但它的切割和重新连接的催化机制可能会意外导致 DNA 双链断裂(DSBs),严重损害基因组的完整性。因此,了解决定 TOP2B 在全基因组分布的因素不仅对于完整了解基因组动力学和组织至关重要,而且对于 TOP2 诱导的 DSBs 在致癌易位和其他类型的染色体重排起源中的影响也至关重要。在这里,我们使用公开可用的测序数据,通过机器学习方法来预测 TOP2B 的结合。我们实现了高度准确的预测,可及染色质和结构因素是最具信息量的特征。引人注目的是,仅通过三个特征就足以解释 TOP2B:DNase I 超敏性、CTCF 和黏连蛋白结合,这些特征在全基因组范围内都有广泛的数据。基于此,我们开发了一种用于 TOP2B 全基因组结合的预测模型,可跨细胞系和物种使用,并生成可准确反映实验 ChIP-seq 数据的虚拟概率轨迹。我们的研究结果加深了我们对染色质的可及性和 3D 组织如何决定 TOP2B 功能的理解,并为序列非依赖性染色质结合因子的计算机预测提供了原理证明。