Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.
Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10065, USA.
Nat Commun. 2017 May 17;8:15454. doi: 10.1038/ncomms15454.
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts-for example, distance-dependent random polymer ligation and GC content and mappability bias-and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (∼700 kb-1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body.
在这里,我们提出了 HiC-DC,这是一种从 Hi-C 实验中估计染色质相互作用统计显著性(P 值)的原则性方法。HiC-DC 使用障碍负二项式回归来解释 Hi-C 读取计数中的系统变异来源——例如,距离依赖性随机聚合物连接以及 GC 含量和可映射性偏差——并对零膨胀和过分散进行建模。将 HiC-DC 应用于淋巴母细胞系的高分辨率 Hi-C 数据,可在亚拓扑关联域水平上检测到显著的相互作用,识别出 CTCF 结合位点、DNase 可及性和/或活性组蛋白标记支持的潜在结构和调节相互作用。CTCF 相关的相互作用在基因组中间距离范围(约 700kb-1.5Mb)最为丰富,而涉及活跃标记的 DNase 可及元件的相互作用在短距离(<500kb)和长距离(>1.5Mb)都有富集。连接染色体 6 上复制依赖性组蛋白基因的长距离相互作用显著富集,这可能代表了组蛋白基因座体的染色质结构。