Dang Dachang, Zhang Shao-Wu, Dong Kangning, Duan Ran, Zhang Shihua
Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, 1 DongXiang Road, Chang'an District, Xi'an 710072, China.
NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 55 Zhongguancun East Road, Haidian District, Beijing 100190, China.
Nucleic Acids Res. 2025 Feb 8;53(4). doi: 10.1093/nar/gkae1267.
Topologically associating domains (TADs) are essential components of three-dimensional (3D) genome organization and significantly influence gene transcription regulation. However, accurately identifying TADs from sparse chromatin contact maps and exploring the structural and functional elements within TADs remain challenging. To this end, we develop TADGATE, a graph attention auto-encoder that can generate imputed maps from sparse Hi-C contact maps while adaptively preserving or enhancing the underlying topological structures, thereby facilitating TAD identification. TADGATE captures specific attention patterns with two types of units within TADs and demonstrates TAD organization relates to chromatin compartmentalization with diverse biological properties. We identify many structural and functional elements within TADs, with their abundance reflecting the overall properties of these domains. We applied TADGATE to sparse and noisy Hi-C contact maps from 21 human tissues or cell lines. That improved the clarity of TAD structures, allowing us to investigate conserved and cell-type-specific boundaries and uncover cell-type-specific transcriptional regulatory mechanisms associated with topological domains. We also demonstrated TADGATE's capability to fill in sparse single-cell Hi-C contact maps and identify TAD-like domains within them, revealing the specific domain boundaries with distinct heterogeneity and the shared backbone boundaries characterized by strong CTCF enrichment and high gene expression levels.
拓扑关联结构域(TADs)是三维(3D)基因组组织的重要组成部分,对基因转录调控有显著影响。然而,从稀疏的染色质接触图谱中准确识别TADs并探索TADs内的结构和功能元件仍然具有挑战性。为此,我们开发了TADGATE,这是一种图注意力自动编码器,它可以从稀疏的Hi-C接触图谱生成估算图谱,同时自适应地保留或增强潜在的拓扑结构,从而便于TAD识别。TADGATE通过TADs内的两种类型的单元捕获特定的注意力模式,并证明TAD组织与具有不同生物学特性的染色质区室化有关。我们在TADs内识别出许多结构和功能元件,它们的丰度反映了这些结构域的整体特性。我们将TADGATE应用于来自21种人类组织或细胞系的稀疏且有噪声的Hi-C接触图谱。这提高了TAD结构的清晰度,使我们能够研究保守的和细胞类型特异性的边界,并揭示与拓扑结构域相关的细胞类型特异性转录调控机制。我们还展示了TADGATE填补稀疏单细胞Hi-C接触图谱并识别其中类似TAD的结构域的能力,揭示了具有明显异质性的特定结构域边界以及以强烈的CTCF富集和高基因表达水平为特征的共享主干边界。