Groningen Bioinformatics Centre, University of Groningen, Kerklaan 30, Biologisch Centrum, 9751 NN Haren, The Netherlands.
Bioinformatics. 2010 Apr 15;26(8):1000-6. doi: 10.1093/bioinformatics/btq087. Epub 2010 Mar 5.
ChIP-chip and ChIP-seq technologies provide genome-wide measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples collected from different tissue types and/or individuals, we can now begin to characterize stochastic or systematic changes in epigenetic patterns during development (intra-individual) or at the population level (inter-individual). This requires statistical methods that permit a simultaneous comparison of multiple ChIP samples on a global as well as locus-specific scale. Current analytical approaches are mainly geared toward single sample investigations, and therefore have limited applicability in this comparative setting. This shortcoming presents a bottleneck in biological interpretations of multiple sample data.
To address this limitation, we introduce a parametric classification approach for the simultaneous analysis of two (or more) ChIP samples. We consider several competing models that reflect alternative biological assumptions about the global distribution of the data. Inferences about locus-specific and genome-wide chromatin differences are reached through the estimation of multivariate mixtures. Parameter estimates are obtained using an incremental version of the Expectation-Maximization algorithm (IEM). We demonstrate efficient scalability and application to three very diverse ChIP-chip and ChIP-seq experiments. The proposed approach is evaluated against several published ChIP-chip and ChIP-seq software packages. We recommend its use as a first-pass algorithm to identify candidate regions in the epigenome, possibly followed by some type of second-pass algorithm to fine-tune detected peaks in accordance with biological or technological criteria.
R source code is available at http://gbic.biol.rug.nl/supplementary/2009/ChromatinProfiles/. Access to Chip-seq data: GEO repository GSE17937.
ChIP-chip 和 ChIP-seq 技术以前所未有的分辨率提供了各种类型染色质标记的全基因组测量。通过从不同组织类型和/或个体收集 ChIP 样本,我们现在可以开始描述发育过程中(个体内)或群体水平上(个体间)表观遗传模式的随机或系统变化。这需要统计学方法,允许在全局和特定基因座尺度上同时比较多个 ChIP 样本。目前的分析方法主要针对单个样本研究,因此在这种比较环境中的适用性有限。这一缺点是对多个样本数据进行生物学解释的瓶颈。
为了解决这一限制,我们引入了一种参数分类方法,用于同时分析两个(或更多)ChIP 样本。我们考虑了几种具有竞争力的模型,这些模型反映了关于数据全局分布的替代生物学假设。通过对多元混合物的估计,得出关于特定基因座和全基因组染色质差异的推论。使用期望最大化算法(IEM)的增量版本获得参数估计。我们证明了该方法在三个非常不同的 ChIP-chip 和 ChIP-seq 实验中的高效可扩展性和应用。该方法针对几个已发表的 ChIP-chip 和 ChIP-seq 软件包进行了评估。我们建议将其用作识别表观基因组中候选区域的第一遍算法,可能需要根据生物学或技术标准使用某种类型的第二遍算法来微调检测到的峰。
R 源代码可在 http://gbic.biol.rug.nl/supplementary/2009/ChromatinProfiles/ 获得。ChIP-seq 数据访问:GEO 数据库 GSE17937。