Lun Aaron T L, Smyth Gordon K
The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Melbourne, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, VIC, 3010, Melbourne, Australia.
BMC Bioinformatics. 2015 Aug 19;16:258. doi: 10.1186/s12859-015-0683-0.
Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results.
Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data.
On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.
高通量测序染色质构象捕获技术(Hi-C)是一种测量基因组中所有基因座对之间体内相互作用强度的技术。大多数传统的Hi-C数据分析侧重于检测具有统计学意义的相互作用。然而,另一种策略是识别两种或更多生物条件之间相互作用强度的显著变化(即差异相互作用)。这在统计学上更为严谨,可能会提供更具生物学相关性的结果。
在这里,我们展示了用于从Hi-C数据中检测差异相互作用的diffHic软件包。diffHic提供了读段对比对和处理、计数到bin对、过滤低丰度事件以及对趋势或CNV驱动偏差进行归一化的方法。它使用edgeR软件包的统计框架来模拟生物变异性并检验不同条件之间的显著差异。还包括了几种结果可视化的选项。通过实际的Hi-C数据集展示了diffHic的使用。还使用模拟数据评估了其与现有方法相比的性能。
在实际数据上,diffHic能够成功检测生物条件之间强度存在显著差异的相互作用。在模拟数据集上,它也优于现有软件工具。这些结果表明,diffHic是Hi-C数据差异分析的一种可行方法。