Saberi Saeed, Farré Pau, Cuvier Olivier, Emberly Eldon
Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada.
Laboratoire de Biologie Moléculaire des Eucaryotes (LBME), Toulouse, France.
BMC Bioinformatics. 2015 May 23;16:171. doi: 10.1186/s12859-015-0584-2.
A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.
We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.
We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.
多种DNA结合蛋白参与调控和塑造染色质的包装。它们有助于在DNA中形成环,这些环起到隔离不同结构域的作用。一种最新的实验技术——Hi-C,提供了一种确定基因组所有远距离部分之间这种环化频率的方法。鉴于许多与染色质相关蛋白的结合位置也已被测定,因此有可能估计它们对通过Hi-C测量的长程相互作用的影响。然而,该分析中的一个挑战是占主导地位的非特异性接触掩盖了感兴趣的特异性相互作用。
我们表明,将Hi-C接触频率转化为自由能提供了一种自然的方法来分离距离依赖性非特异性相互作用。特别是,我们将主成分分析(PCA)应用于转化后的自由能矩阵,以识别主要的相互作用模式。PCA识别出Hi-C数据中的系统效应以及高频空间噪声,这些可以被过滤掉。因此,它可以用作一种数据驱动的方法来对Hi-C数据进行归一化。我们通过使用成对相互作用模型拟合转化后的Hi-C数据来评估这种基于PCA的归一化方法以及其他几种归一化方案,该模型将结合的染色质因子的已知位置作为输入。拟合结果是一组关于各种染色质因子之间耦合能及其对环化能量学影响的预测。我们表明,拟合质量可以用作确定应将多少PCA滤波应用于Hi-C数据的一种手段。
我们发现,Hi-C数据的不同归一化在与成对相互作用模型的拟合质量上有所不同。PCA滤波可以改善拟合,并且预测的耦合能为各种染色质结合因子如何影响染色质中DNA环的稳定性提供了具有生物学意义的见解。