Gong Haiyan, Zhang Sichen, Zhang Xiaotong, Chen Yang
Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China.
Shunde Innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China.
Comput Struct Biotechnol J. 2024 Apr 16;23:1584-1593. doi: 10.1016/j.csbj.2024.04.008. eCollection 2024 Dec.
For many years, multi-scale models of chromatin domains, such as A/B compartments, sub-compartments, topologically associated domains (TADs), sub-TADs, and loops have been popular. However, existing methods can only identify structures at a single scale and cannot partition multi-scale structures. In this paper, we proposed a method (TORNADOES) for chromatin domain partitioning based on hypergraph clustering. First, we use a density clustering algorithm to identify TADs at different scales based on Hi-C data with different resolutions. Then, by combining ChIP-seq data features and TAD results at different scales, we generate a hypergraph based on these TADs. Finally, we partition the chromatin domain structure at different scales, including A/B, A1, A2, B1, B2, and B3 based on the Laplacian matrix feature of the hypergraph. Similarity comparison experiments and ChIP-seq signal enrichment analysis are performed on the A/B region and sub-TAD levels, respectively, demonstrating that our method can identify chromatin domains with distinct features and provide a deeper understanding of the organizational patterns and functional differences in TADs at the genomic hierarchical structure. Comparative analysis of multiple cell line data shows that TORNADOES can better classify different numbers and types of compartments by changing the factors ChIP-seq data and clustering number used to characterize TAD compared to other methods. Source code for the TORNADOES method can be found at https://github.com/ghaiyan/TORNADOES.
多年来,诸如A/B区室、子区室、拓扑相关结构域(TAD)、子TAD和环等染色质结构域的多尺度模型一直很受欢迎。然而,现有方法只能识别单一尺度的结构,无法划分多尺度结构。在本文中,我们提出了一种基于超图聚类的染色质结构域划分方法(TORNADOES)。首先,我们使用密度聚类算法基于不同分辨率的Hi-C数据识别不同尺度的TAD。然后,通过结合ChIP-seq数据特征和不同尺度的TAD结果,我们基于这些TAD生成一个超图。最后,我们基于超图的拉普拉斯矩阵特征划分不同尺度的染色质结构域,包括A/B、A1、A2、B1、B2和B3。分别在A/B区域和子TAD水平上进行相似性比较实验和ChIP-seq信号富集分析,表明我们的方法可以识别具有不同特征的染色质结构域,并对基因组层次结构中TAD的组织模式和功能差异有更深入的理解。对多个细胞系数据的比较分析表明,与其他方法相比,TORNADOES通过改变用于表征TAD的ChIP-seq数据和聚类数量等因素,可以更好地对不同数量和类型的区室进行分类。TORNADOES方法的源代码可在https://github.com/ghaiyan/TORNADOES上找到。