Department of Statistics, and Department of Biostatistics and Medical Informatics, 1300 University Avenue, Madison, WI 53706, USA.
Bioinformatics. 2014 Mar 15;30(6):753-60. doi: 10.1093/bioinformatics/btt200. Epub 2013 May 10.
ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Although the availability of basic analysis tools for ChIP-seq data is rapidly increasing, there has not been much progress on the related design issues. A challenging question for designing a ChIP-seq experiment is how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determine whether all the underlying targets (e.g. binding locations or epigenomic profiles) can be identified with a targeted power.
We developed a statistical framework named CSSP (ChIP-seq Statistical Power) for power calculations in ChIP-seq experiments by considering a local Poisson model, which is commonly adopted by many peak callers. Evaluations with simulations and data-driven computational experiments demonstrate that this framework can reliably estimate the power of a ChIP-seq experiment at different sequencing depths based on pilot data. Furthermore, it provides an analytical approach for calculating the required depth for a targeted power while controlling the false discovery rate at a user-specified level. Hence, our results enable researchers to use their own or publicly available data for determining required sequencing depths of their ChIP-seq experiments and potentially make better use of the multiplexing functionality of the sequencers. Evaluation of power for multiple public ChIP-seq datasets indicate that, currently, typical ChIP-seq studies are powered well for detecting large fold changes of ChIP enrichment over the control sample, but they have considerably less power for detecting smaller fold changes.
Available at www.stat.wisc.edu/~zuo/CSSP.
Supplementary data are available at Bioinformatics online.
ChIP-seq 技术使研究人员能够研究转录因子的全基因组结合和表观基因组标记的作图。尽管 ChIP-seq 数据的基本分析工具的可用性正在迅速增加,但在相关设计问题上并没有太大进展。设计 ChIP-seq 实验的一个具有挑战性的问题是 ChIP 和对照样本应该测序多深?答案取决于多个因素,其中一些因素可以根据实验者的初步数据进行设置。ChIP-seq 实验的测序深度是决定所有潜在目标(例如结合位置或表观基因组图谱)是否可以用靶向功率识别的关键因素之一。
我们开发了一个名为 CSSP(ChIP-seq 统计功效)的统计框架,用于通过考虑常用的许多峰调用器采用的局部泊松模型来计算 ChIP-seq 实验的功效。模拟和数据驱动的计算实验评估表明,该框架可以根据初步数据可靠地估计不同测序深度的 ChIP-seq 实验的功效。此外,它提供了一种分析方法,用于在控制用户指定水平的假发现率的同时,计算靶向功效所需的深度。因此,我们的结果使研究人员能够使用自己或公开可用的数据来确定其 ChIP-seq 实验的所需测序深度,并有可能更好地利用测序仪的多路复用功能。对多个公共 ChIP-seq 数据集的功效评估表明,目前,典型的 ChIP-seq 研究在检测 ChIP 富集相对于对照样本的大倍数变化方面具有很好的功效,但在检测较小倍数变化方面的功效则要差得多。
可在 www.stat.wisc.edu/~zuo/CSSP 上获得。
补充数据可在生物信息学在线获得。