Department of Computer Science, Ozyegin University, Istanbul, Turkey.
BMC Genomics. 2022 Apr 9;23(1):287. doi: 10.1186/s12864-022-08498-5.
Hi-C and its high nucleosome resolution variant Micro-C provide a window into the spatial packing of a genome in 3D within the cell. Even though both techniques do not directly depend on the binding of specific antibodies, previous work has revealed enriched interactions and domain structures around multiple chromatin marks; epigenetic modifications and transcription factor binding sites. However, the joint impact of chromatin marks in Hi-C and Micro-C interactions have not been globally characterized, which limits our understanding of 3D genome characteristics. An emerging question is whether it is possible to deduce 3D genome characteristics and interactions by integrative analysis of multiple chromatin marks and associate interactions to functionality of the interacting loci.
We come up with a probabilistic method PROBC to decompose Hi-C and Micro-C interactions by known chromatin marks. PROBC is based on convex likelihood optimization, which can directly take into account both interaction existence and nonexistence. Through PROBC, we discover histone modifications (H3K27ac, H3K9me3, H3K4me3, H3K4me1) and CTCF as particularly predictive of Hi-C and Micro-C contacts across cell types and species. Moreover, histone modifications are more effective than transcription factor binding sites in explaining the genome's 3D shape through these interactions. PROBC can successfully predict Hi-C and Micro-C interactions in given species, while it is trained on different cell types or species. For instance, it can predict missing nucleosome resolution Micro-C interactions in human ES cells trained on mouse ES cells only from these 5 chromatin marks with above 0.75 AUC. Additionally, PROBC outperforms the existing methods in predicting interactions across almost all chromosomes.
Via our proposed method, we optimally decompose Hi-C interactions in terms of these chromatin marks at genome and chromosome levels. We find a subset of histone modifications and transcription factor binding sites to be predictive of both Hi-C and Micro-C interactions and TADs across human, mouse, and different cell types. Through learned models, we can predict interactions on species just from chromatin marks for which Hi-C data may be limited.
Hi-C 和其高核小体分辨率变体 Micro-C 提供了一个在细胞内观察基因组在 3D 空间中包装的窗口。尽管这两种技术都不直接依赖于特定抗体的结合,但之前的工作已经揭示了多个染色质标记周围的富集相互作用和结构域;表观遗传修饰和转录因子结合位点。然而,Hi-C 和 Micro-C 相互作用中染色质标记的联合影响尚未得到全面描述,这限制了我们对 3D 基因组特征的理解。一个新出现的问题是,是否可以通过整合分析多个染色质标记来推断 3D 基因组特征和相互作用,并将相互作用与相互作用基因座的功能联系起来。
我们提出了一种概率方法 PROBC,通过已知的染色质标记来分解 Hi-C 和 Micro-C 的相互作用。PROBC 基于凸似然优化,它可以直接考虑相互作用的存在和不存在。通过 PROBC,我们发现组蛋白修饰(H3K27ac、H3K9me3、H3K4me3、H3K4me1)和 CTCF 特别能够预测不同细胞类型和物种中的 Hi-C 和 Micro-C 接触。此外,在通过这些相互作用解释基因组的 3D 形状方面,组蛋白修饰比转录因子结合位点更有效。PROBC 可以成功地预测给定物种中的 Hi-C 和 Micro-C 相互作用,而这些相互作用是在不同的细胞类型或物种上进行训练的。例如,它可以仅从这 5 个染色质标记中预测出在人类 ES 细胞中缺失的核小体分辨率 Micro-C 相互作用,而这些标记是在小鼠 ES 细胞上进行训练的,其 AUC 超过 0.75。此外,PROBC 在预测几乎所有染色体的相互作用方面都优于现有方法。
通过我们提出的方法,我们在基因组和染色体水平上以这些染色质标记为条件优化地分解了 Hi-C 相互作用。我们发现了一组组蛋白修饰和转录因子结合位点,这些修饰和结合位点可以预测人类、小鼠和不同细胞类型的 Hi-C 和 Micro-C 相互作用以及 TADs。通过学习模型,我们可以仅从染色质标记预测物种的相互作用,而这些标记的 Hi-C 数据可能是有限的。