Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Nat Protoc. 2020 Mar;15(3):991-1012. doi: 10.1038/s41596-019-0273-0. Epub 2020 Jan 24.
Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes 12 h with preprocessing when all use cases are run sequentially (4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (~48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index.
Fit-Hi-C 是一个编程应用程序,用于计算 Hi-C 接触图谱的统计置信估计,以识别显著的染色质接触。通过拟合单调非递增样条,Fit-Hi-C 在不进行任何参数假设的情况下捕捉基因组距离和接触概率之间的关系。样条拟合以及对 bin 或基因座特异性偏差的接触概率的校正,考虑了先前表征的影响 Hi-C 接触计数的协变量。Fit-Hi-C 最适用于研究中程(例如,人类基因组的 20 kb-2 Mb)染色体内接触;然而,随着最新的重新实现,命名为 FitHiC2,它可以对高分辨率 Hi-C 数据进行全基因组分析,包括所有染色体内距离和染色体间接触。FitHiC2 还提供了一个合并过滤模块,该模块消除了间接/旁观者相互作用,从而显著减少了报告的接触数量,而不会牺牲关键环(如会聚 CTCF 结合位点之间的环)的恢复。在这里,我们描述了如何将 FitHiC2 协议应用于三个用例:(i)来自 GM12878(人类淋巴母细胞系)的染色体 5 的 5-kb 分辨率 Hi-C 数据,(ii)来自 IMR90(人肺成纤维细胞)的 40-kb 分辨率全基因组 Hi-C 数据,以及(iii)酿酒酵母全基因组 Hi-C 数据在单个限制切割位点(EcoRI)分辨率。当所有用例顺序运行时,预处理需要约 12 小时(当并行运行时,需要约 4 小时)。随着其实现的最新改进,FitHiC2(8 个处理器和 16 GB 内存)也可扩展到迄今为止最高分辨率(1 kb)Hi-C 数据的全基因组分析(32 GB 峰值内存时约 48 小时)。FitHiC2 可通过 Bioconda、GitHub 和 Python 包索引获得。