Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
Bioinformatics. 2019 Sep 1;35(17):2916-2923. doi: 10.1093/bioinformatics/btz048.
With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets.
Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights.
multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare.
Supplementary data are available at Bioinformatics online.
随着染色质构象捕获技术及其高通量衍生技术 Hi-C 测序的发展,涉及多个 Hi-C 数据集的基因组三维互作组的研究变得可行。为了解决每个数据集特有的技术驱动偏差问题,需要有一种独特的方法来联合归一化多个 Hi-C 数据集。以前从 Hi-C 数据中去除偏差的尝试利用了归一化单个 Hi-C 数据集的技术,或者最多只能联合归一化两个数据集。
在这里,我们提出了 multiHiCcompare,这是一种基于循环局部回归的联合归一化技术,用于去除多个 Hi-C 数据集之间的偏差。与其他归一化技术不同,它正确处理了 Hi-C 特有的染色质相互作用频率随相互作用区域之间距离增加而衰减的问题。multiHiCcompare 使用广义线性模型框架进行多个 Hi-C 数据集的比较分析,适用于染色质相互作用频率的 Hi-C 特定衰减。当从联合归一化数据集中检测到先验已知的染色质相互作用差异时,multiHiCcompare 的表现优于其他方法。在分析生长素处理与未处理的实验和 CTCF 耗竭实验时,multiHiCcompare 能够恢复预期的染色质相互作用丧失的表观遗传和基因表达特征,并揭示新的见解。
multiHiCcompare 可在 GitHub 上免费获得,并作为 Bioconductor R 包 https://bioconductor.org/packages/multiHiCcompare 提供。
补充数据可在 Bioinformatics 在线获得。