School of Information Systems, Computing and Mathematics, Brunel University, London, UK.
BMC Bioinformatics. 2013 May 30;14:169. doi: 10.1186/1471-2105-14-169.
ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data.
We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins.
We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.
免疫沉淀 (IP) 效率在不同抗体之间以及同一抗体的重复实验之间可能有很大差异。这些差异对 ChIP-seq 数据的质量有很大影响:更有效的实验必然会导致更高的信号与背景比,因此与效率较低的实验相比,会出现明显更多的富集区域。在本文中,我们展示了如何在 ChIP-seq 数据的联合统计建模中明确考虑 IP 效率。
我们对来自两个实验室的两种蛋白质的八项实验拟合了一个潜在混合模型,这两个实验室使用不同的抗体。我们使用模型参数来估计单个实验的效率,发现这些效率在不同的实验室之间以及同一实验室的技术重复之间存在明显差异。当我们考虑 ChIP 效率时,我们发现与效率较低的实验相比,在相同的假发现率下,效率较高的实验中结合的区域更多。在模型中还可以包含先验知识,即在两个不同的蛋白质之间进行实验时,所有实验中具有相同数量的结合位点,从而更稳健地检测到差异结合区域。
我们提出了一种用于从多个 ChIP-seq 数据集检测富集和差异结合区域的统计模型。我们提出的框架明确考虑了 ChIP-seq 数据中的 IP 效率,并允许联合而不是单独地对来自不同蛋白质的重复实验进行建模,从而得出更稳健的生物学结论。