Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
Department of Statistics, Stanford University, Stanford, CA, USA.
Bioinformatics. 2018 Aug 15;34(16):2701-2707. doi: 10.1093/bioinformatics/bty164.
The three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of three-dimensional chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts.
We introduce a concordance measure called DIfferences between Smoothed COntact maps (GenomeDISCO) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCO's sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP.
Software implementing GenomeDISCO is available at https://github.com/kundajelab/genomedisco.
Supplementary data are available at Bioinformatics online.
染色质的三维组织在基因调控和疾病中起着关键作用。高通量染色体构象捕获实验,如 Hi-C,用于获得全基因组三维染色质接触图谱。然而,由于染色质接触的多尺度、层次结构以及数据中实验噪声的特性,稳健地估计数据质量和系统地比较这些接触图谱具有挑战性。测量接触图谱的一致性对于评估重复实验的再现性以及对于在不同细胞环境之间建模变异非常重要。
我们引入了一种称为差异平滑接触图(GenomeDISCO)的一致性度量方法,用于评估来自染色体构象捕获实验的一对接触图谱的相似性。关键思想是在估计一致性之前,使用接触图图上的随机游走来平滑接触图。我们使用模拟数据集来基准测试 GenomeDISCO 对影响染色质接触图谱的不同类型噪声的敏感性。当应用于大量 Hi-C 数据集时,GenomeDISCO 可以准确地区分来自不同细胞类型的生物重复样本和样本。GenomeDISCO 还推广到其他染色体构象捕获测定,如 HiChIP。
可在 https://github.com/kundajelab/genomedisco 上获得实现 GenomeDISCO 的软件。
补充数据可在生物信息学在线获得。