Franzini Stefano, Di Stefano Marco, Micheletti Cristian
SISSA - Scuola Internazionale Superiore di Studi Avanzati, Trieste I-34077, Italy.
Structural Genomics Group, CNAG-CRG Centre Nacional d'Análisi Genómica - Centre de Regulació Genómica, Barcelona 08028, Spain.
Bioinformatics. 2021 Aug 9;37(15):2088-2094. doi: 10.1093/bioinformatics/btab062.
Hi-C matrices are cornerstones for qualitative and quantitative studies of genome folding, from its territorial organization to compartments and topological domains. The high dynamic range of genomic distances probed in Hi-C assays reflects in an inherent stochastic background of the interactions matrices, which inevitably convolve the features of interest with largely non-specific ones.
Here, we introduce and discuss essHi-C, a method to isolate the specific or essential component of Hi-C matrices from the non-specific portion of the spectrum compatible with random matrices. Systematic comparisons show that essHi-C improves the clarity of the interaction patterns, enhances the robustness against sequencing depth of topologically associating domains identification, allows the unsupervised clustering of experiments in different cell lines and recovers the cell-cycle phasing of single-cells based on Hi-C data. Thus, essHi-C provides means for isolating significant biological and physical features from Hi-C matrices.
The essHi-C software package is available at https://github.com/stefanofranzini/essHIC.
Supplementary data are available at Bioinformatics online.
Hi-C矩阵是基因组折叠定性和定量研究的基石,涵盖从其区域组织到区室和拓扑结构域的研究。Hi-C分析中探测的基因组距离的高动态范围反映在相互作用矩阵固有的随机背景中,这不可避免地将感兴趣的特征与很大程度上非特异性的特征混在一起。
在此,我们介绍并讨论essHi-C,这是一种从与随机矩阵兼容的光谱非特异性部分中分离Hi-C矩阵特定或基本成分的方法。系统比较表明,essHi-C提高了相互作用模式的清晰度,增强了拓扑关联结构域识别对测序深度的稳健性,允许对不同细胞系中的实验进行无监督聚类,并基于Hi-C数据恢复单细胞的细胞周期阶段。因此,essHi-C提供了从Hi-C矩阵中分离重要生物学和物理特征的方法。
essHi-C软件包可在https://github.com/stefanofranzini/essHIC获取。
补充数据可在《生物信息学》在线获取。