Biostatistics Branch, National Institute of Environmental Health Sciences, RTP, NC 27709, USA.
Nucleic Acids Res. 2011 Oct;39(19):e130. doi: 10.1093/nar/gkr592. Epub 2011 Jul 29.
We propose a new and effective statistical framework for identifying genome-wide differential changes in epigenetic marks with ChIP-seq data or gene expression with mRNA-seq data, and we develop a new software tool EpiCenter that can efficiently perform data analysis. The key features of our framework are: (i) providing multiple normalization methods to achieve appropriate normalization under different scenarios, (ii) using a sequence of three statistical tests to eliminate background regions and to account for different sources of variation and (iii) allowing adjustment for multiple testing to control false discovery rate (FDR) or family-wise type I error. Our software EpiCenter can perform multiple analytic tasks including: (i) identifying genome-wide epigenetic changes or differentially expressed genes, (ii) finding transcription factor binding sites and (iii) converting multiple-sample sequencing data into a single read-count data matrix. By simulation, we show that our framework achieves a low FDR consistently over a broad range of read coverage and biological variation. Through two real examples, we demonstrate the effectiveness of our framework and the usages of our tool. In particular, we show that our novel and robust 'parsimony' normalization method is superior to the widely-used 'tagRatio' method. Our software EpiCenter is freely available to the public.
我们提出了一个新的、有效的统计框架,用于识别 ChIP-seq 数据中的全基因组差异表观遗传标记或 mRNA-seq 数据中的基因表达差异,并开发了一个新的软件工具 EpiCenter,可高效地进行数据分析。我们的框架的主要特点是:(i)提供多种归一化方法,以便在不同情况下实现适当的归一化;(ii)使用三个统计测试序列,以消除背景区域,并考虑不同来源的变化;(iii)允许进行多次测试调整,以控制假发现率(FDR)或一类错误。我们的软件 EpiCenter 可以执行多种分析任务,包括:(i)识别全基因组的表观遗传变化或差异表达基因;(ii)寻找转录因子结合位点;(iii)将多样本测序数据转换为单个读取计数数据矩阵。通过模拟,我们表明我们的框架在广泛的读取覆盖范围和生物变异范围内始终实现较低的 FDR。通过两个实际示例,我们展示了我们的框架的有效性和工具的用途。特别是,我们表明我们新颖而稳健的“简约”归一化方法优于广泛使用的“标签比”方法。我们的软件 EpiCenter 可供公众免费使用。