MRC Laboratory of Molecular Biology, Hills Rd, CB2 0QH Cambridge, UK.
Nucleic Acids Res. 2011 Mar;39(5):e27. doi: 10.1093/nar/gkq1226. Epub 2010 Dec 3.
The combination of chromatin immunoprecipitation with next-generation sequencing technology (ChIP-seq) is a powerful and increasingly popular method for mapping protein-DNA interactions in a genome-wide fashion. The conventional way of analyzing this data is to identify sequencing peaks along the chromosomes that are significantly higher than the read background. For histone modifications and other epigenetic marks, it is often preferable to find a characteristic region of enrichment in sequencing reads relative to gene annotations. For instance, many histone modifications are typically enriched around transcription start sites. Calculating the optimal window that describes this enrichment allows one to quantify modification levels for each individual gene. Using data sets for the H3K9/14ac histone modification in Th cells and an accompanying IgG control, we present an analysis strategy that alternates between single gene and global data distribution levels and allows a clear distinction between experimental background and signal. Curve fitting permits false discovery rate-based classification of genes as modified versus unmodified. We have developed a software package called EpiChIP that carries out this type of analysis, including integration with and visualization of gene expression data.
染色质免疫沉淀与下一代测序技术(ChIP-seq)相结合,是一种强大且日益流行的方法,可在全基因组范围内绘制蛋白质-DNA 相互作用图谱。分析这种数据的传统方法是识别染色体上的测序峰,这些峰的信号显著高于读取背景。对于组蛋白修饰和其他表观遗传标记,相对于基因注释,通常更希望在测序读取中找到富集的特征区域。例如,许多组蛋白修饰通常在转录起始位点附近富集。计算描述这种富集的最佳窗口可以量化每个基因的修饰水平。使用 Th 细胞中的 H3K9/14ac 组蛋白修饰数据集和配套的 IgG 对照数据集,我们提出了一种分析策略,该策略在单个基因和全局数据分布水平之间交替进行,并允许在实验背景和信号之间进行清晰区分。曲线拟合允许基于错误发现率对基因进行分类,分为修饰和未修饰两类。我们开发了一个名为 EpiChIP 的软件包,可执行此类分析,包括与基因表达数据的集成和可视化。