Gao Ruoying, Ferraro Thomas N, Chen Liang, Zhang Shaoqiang, Chen Yong
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China.
Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA.
Biology (Basel). 2025 Mar 12;14(3):288. doi: 10.3390/biology14030288.
The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.
染色质在细胞核中的三维组织在真核细胞中调节基因表达和维持细胞功能方面起着关键作用。高通量染色体构象捕获(Hi-C)及其衍生技术已被开发出来,用于在群体和单细胞水平上绘制全基因组染色质相互作用图谱。然而,大量Hi-C数据,特别是单细胞Hi-C(scHi-C)数据中测序深度不足和噪声水平高,导致接触矩阵分辨率低,从而限制了在识别复杂染色体组织方面的各种下游计算分析。为了应对这些挑战,我们开发了一种基于Transformer的深度学习模型HiCENT,用于估算和增强scHi-C和Hi-C接触矩阵。在大规模大量Hi-C和scHi-C数据集上的验证实验表明,与五种常用方法相比,HiCENT具有卓越的增强效果。当应用于来自GM12878细胞系的真实Hi-C数据时,HiCENT有效地增强了拓扑相关结构域和染色体环尺度上的三维结构特征。此外,当应用于来自五种人类细胞系的scHi-C数据时,它显著提高了聚类性能,优于五种广泛使用的方法。HiCENT在不同数据集上的适应性及其改善染色质相互作用数据质量的能力将促进三维基因组研究、单细胞研究和其他大规模组学研究中的各种下游计算分析。