School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad211.
Hi-C technology has been the most widely used chromosome conformation capture (3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However, due to the fact that high-resolution Hi-C data require deep sequencing and thus high experimental cost, most available Hi-C data are in low-resolution. Hence, it is essential to enhance the quality of Hi-C data by developing the effective computational methods.
In this work, we propose a novel method, so-called DFHiC, which generates the high-resolution Hi-C matrix from the low-resolution Hi-C matrix in the framework of the dilated convolutional neural network. The dilated convolution is able to effectively explore the global patterns in the overall Hi-C matrix by taking advantage of the information of the Hi-C matrix in a way of the longer genomic distance. Consequently, DFHiC can improve the resolution of the Hi-C matrix reliably and accurately. More importantly, the super-resolution Hi-C data enhanced by DFHiC is more in line with the real high-resolution Hi-C data than those done by the other existing methods, in terms of both chromatin significant interactions and identifying topologically associating domains.
Hi-C 技术是最广泛使用的染色体构象捕获(3C)实验,用于测量整个基因组中所有配对相互作用的频率,是研究基因组三维结构的有力工具。构建的基因组结构的精细程度取决于 Hi-C 数据的分辨率。然而,由于高分辨率 Hi-C 数据需要深度测序,因此实验成本很高,大多数可用的 Hi-C 数据都是低分辨率的。因此,通过开发有效的计算方法来提高 Hi-C 数据的质量是至关重要的。
在这项工作中,我们提出了一种新的方法,称为 DFHiC,它在扩张卷积神经网络的框架中,从低分辨率的 Hi-C 矩阵生成高分辨率的 Hi-C 矩阵。扩张卷积能够通过以更长的基因组距离的方式利用 Hi-C 矩阵的信息,有效地探索整个 Hi-C 矩阵中的全局模式。因此,DFHiC 可以可靠而准确地提高 Hi-C 矩阵的分辨率。更重要的是,DFHiC 增强的超高分辨率 Hi-C 数据在染色质显著相互作用和识别拓扑关联域方面,比其他现有方法更符合真实的高分辨率 Hi-C 数据。