Li Qinyao, Li Kelly Yichen, Nicoletti Chiara, Puri Pier Lorenzo, Cao Qin, Yip Kevin Y
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR.
Cancer Genome and Epigenetics Program, NCI-Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA.
bioRxiv. 2024 Oct 24:2024.10.21.619560. doi: 10.1101/2024.10.21.619560.
Computational enhancement is an important strategy for inferring high-resolution features from genome-wide chromosome conformation capture (Hi-C) data, which typically have limited resolution. Deep learning has been highly successful in this task but we show that it creates prevalent artificial structures in the enhanced data due to the need to divide the large contact matrix into small patches. In addition, previous deep learning methods largely focus on local patterns, which cannot fully capture the complexity of Hi-C data. Here we propose Smooth, High-resolution, and Accurate Reconstruction of Patterns (SHARP) for enhancing Hi-C data. It uses the novel approach of decomposing the data into three types of signals, due to one-dimensional proximity, contiguous domains, and other fine structures, and applies deep learning only to the third type of signals, such that enhancement of the first two is unaffected by the patches. For the deep learning part, SHARP uses both local and global attention mechanisms to capture multi-scale contextual information. We compare SHARP with state-of-the-art methods extensively, including application to data from new samples and another species, and show that SHARP has superior performance in terms of resolution enhancement accuracy, avoiding creation of artificial structures, identifying significant interactions, and enrichment in chromatin states.
计算增强是从全基因组染色体构象捕获(Hi-C)数据中推断高分辨率特征的重要策略,Hi-C数据通常分辨率有限。深度学习在这项任务中取得了巨大成功,但我们表明,由于需要将大型接触矩阵划分为小补丁,它在增强数据中会产生普遍的人工结构。此外,以前的深度学习方法主要关注局部模式,无法完全捕捉Hi-C数据的复杂性。在这里,我们提出了用于增强Hi-C数据的平滑、高分辨率和准确模式重建(SHARP)方法。它采用了一种新颖的方法,将数据分解为由于一维邻近性、连续结构域和其他精细结构产生的三种类型的信号,并仅将深度学习应用于第三种类型的信号,从而使前两种信号的增强不受补丁的影响。对于深度学习部分,SHARP使用局部和全局注意力机制来捕捉多尺度上下文信息。我们将SHARP与现有最先进的方法进行了广泛比较,包括应用于新样本和另一个物种的数据,并表明SHARP在分辨率增强准确性、避免创建人工结构、识别显著相互作用以及染色质状态富集方面具有卓越性能。